Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon

How-To Tutorials - Microservices

16 Articles
article-image-yuri-shkuro-on-observability-challenges-in-microservices-and-cloud-native-applications
Packt Editorial Staff
05 Apr 2019
11 min read
Save for later

Yuri Shkuro on Observability challenges in microservices and cloud-native applications

Packt Editorial Staff
05 Apr 2019
11 min read
In the last decade, we saw a significant shift in how modern, internet-scale applications are being built. Cloud computing (infrastructure as a service) and containerization technologies (popularized by Docker) enabled a new breed of distributed system designs commonly referred to as microservices (and their next incarnation, FaaS). Successful companies like Twitter and Netflix have been able to leverage them to build highly scalable, efficient, and reliable systems, and to deliver more features faster to their customers. In this article we explain the concept of observability in microservices, its challenges and traditional monitoring tools in microservices. This article is an extract taken from the book Mastering Distributed Tracing, written by Yuri Shkuro. This book will equip you to operate and enhance your own tracing infrastructure. Through practical exercises and code examples, you will learn how end-to-end tracing can be used as a powerful application performance management and comprehension tool. While there is no official definition of microservices, a certain consensus has evolved over time in the industry. Martin Fowler, the author of many books on software design, argues that microservices architectures exhibit the following common characteristics: Componentization via (micro)services Smart endpoints and dumb pipes Organized around business capabilities Decentralized governance Decentralized data management Infrastructure automation Design for failure Evolutionary design Because of the large number of microservices involved in building modern applications, rapid provisioning, rapid deployment via decentralized continuous delivery, strict DevOps practices, and holistic service monitoring are necessary to effectively develop, maintain, and operate such applications. The infrastructure requirements imposed by the microservices architectures spawned a whole new area of development of infrastructure platforms and tools for managing these complex cloud-native applications. In 2015, the Cloud Native Computing Foundation (CNCF) was created as a vendor-neutral home for many emerging open source projects in this area, such as Kubernetes, Prometheus, Linkerd, and so on, with a mission to "make cloud-native computing ubiquitous." Read more on Honeycomb CEO Charity Majors discusses observability and dealing with “the coming armageddon of complexity” [Interview] What is observability? The term "observability" in control theory states that the system is observable if the internal states of the system and, accordingly, its behavior, can be determined by only looking at its inputs and outputs. At the 2018 Observability Practitioners Summit, Bryan Cantrill, the CTO of Joyent and one of the creators of the tool dtrace, argued that this definition is not practical to apply to software systems because they are so complex that we can never know their complete internal state, and therefore the control theory's binary measure of observability is always zero (I highly recommend watching his talk on YouTube: https://youtu.be/U4E0QxzswQc). Instead, a more useful definition of observability for a software system is its "capability to allow a human to ask and answer questions". The more questions we can ask and answer about the system, the more observable it is. Figure 1: The Twitter debate There are also many debates and Twitter zingers about the difference between monitoring and observability. Traditionally, the term monitoring was used to describe metrics collection and alerting. Sometimes it is used more generally to include other tools, such as "using distributed tracing to monitor distributed transactions." The definition by Oxford dictionaries of the verb "monitor" is "to observe and check the progress or quality of (something) over a period of time; keep under systematic review." However, it is better scoped to describing the process of observing certain a priori defined performance indicators of our software system, such as those measuring an impact on the end-user experience, like latency or error counts, and using their values to alert us when these signals indicate an abnormal behavior of the system. Metrics, logs, and traces can all be used as a means to extract those signals from the application. We can then reserve the term "observability" for situations when we have a human operator proactively asking questions that were not predefined. As Bryan Cantrill put it in his talk, this process is debugging, and we need to "use our brains when debugging." Monitoring does not require a human operator; it can and should be fully automated. "If you want to talk about (metrics, logs, and traces) as pillars of observability–great. The human is the foundation of observability!"  -- BryanCantrill In the end, the so-called "three pillars of observability" (metrics, logs, and traces) are just tools, or more precisely, different ways of extracting sensor data from the applications. Even with metrics, the modern time series solutions like Prometheus, InfluxDB, or Uber's M3 are capable of capturing the time series with many labels, such as which host emitted a particular value of a counter. Not all labels may be useful for monitoring, since a single misbehaving service instance in a cluster of thousands does not warrant an alert that wakes up an engineer. But when we are investigating an outage and trying to narrow down the scope of the problem, the labels can be very useful as observability signals. The observability challenge of microservices By adopting microservices architectures, organizations are expecting to reap many benefits, from better scalability of components to higher developer productivity. There are many books, articles, and blog posts written on this topic, so I will not go into that. Despite the benefits and eager adoption by companies large and small, microservices come with their own challenges and complexity. Companies like Twitter and Netflix were successful in adopting microservices because they found efficient ways of managing that complexity. Vijay Gill, Senior VP of Engineering at Databricks, goes as far as saying that the only good reason to adopt microservices is to be able to scale your engineering organization and to "ship the org chart". So, what are the challenges of this design? There are quite a few: In order to run these microservices in production, we need an advanced orchestration platform that can schedule resources, deploy containers, autoscale, and so on. Operating an architecture of this scale manually is simply not feasible, which is why projects like Kubernetes became so popular. In order to communicate, microservices need to know how to find each other on the network, how to route around problematic areas, how to perform load balancing, how to apply rate limiting, and so on. These functions are delegated to advanced RPC frameworks or external components like network proxies and service meshes. Splitting a monolith into many microservices may actually decrease reliability. Suppose we have 20 components in the application and all of them are required to produce a response to a single request. When we run them in a monolith, our failure modes are restricted to bugs and potentially a crush of the whole server running the monolith. But if we run the same components as microservices, on different hosts and separated by a network, we introduce many more potential failure points, from network hiccups, to resource constraints due to noisy neighbors. The latency may also increase. Assume each microservice has 1 ms average latency, but the 99th percentile is 1s. A transaction touching just one of these services has a 1% chance to take ≥ 1s. A transaction touching 100 of these services has 1 - (1 - 0.01)100 = 63% chance to take ≥ 1s. Finally, the observability of the system is dramatically reduced if we try to use traditional monitoring tools. When we see that some requests to our system are failing or slow, we want our observability tools to tell us the story about what happens to that request. Traditional monitoring tools Traditional monitoring tools were designed for monolith systems, observing the health and behavior of a single application instance. They may be able to tell us a story about that single instance, but they know almost nothing about the distributed transaction that passed through it. These tools "lack the context" of the request. Metrics It goes like this: "Once upon a time…something bad happened. The end." How do you like this story? This is what the chart in Figure 2 tells us. It's not completely useless; we do see a spike and we could define an alert to fire when this happens. But can we explain or troubleshoot the problem? Figure 2: A graph of two time series representing (hypothetically) the volume of traffic to a service Metrics, or stats, are numerical measures recorded by the application, such as counters, gauges, or timers. Metrics are very cheap to collect, since numeric values can be easily aggregated to reduce the overhead of transmitting that data to the monitoring system. They are also fairly accurate, which is why they are very useful for the actual monitoring (as the dictionary defines it) and alerting. Yet the same capacity for aggregation is what makes metrics ill-suited for explaining the pathological behavior of the application. By aggregating data, we are throwing away all the context we had about the individual transactions. Logs Logging is an even more basic observability tool than metrics. Every programmer learns their first programming language by writing a program that prints (that is, logs) "Hello, World!" Similar to metrics, logs struggle with microservices because each log stream only tells us about a single instance of a service. However, the evolving programming paradigms creates other problems for logs as a debugging tool. Ben Sigelman, who built Google's distributed tracing system Dapper, explained it in his KubeCon 2016 keynote talk as four types of concurrency (Figure 3): Figure 3: Evolution of concurrency Years ago, applications like early versions of Apache HTTP Server handled concurrency by forking child processes and having each process handle a single request at a time. Logs collected from that single process could do a good job of describing what happened inside the application. Then came multi-threaded applications and basic concurrency. A single request would typically be executed by a single thread sequentially, so as long as we included the thread name in the logs and filtered by that name, we could still get a reasonably accurate picture of the request execution. Then came asynchronous concurrency, with asynchronous and actor-based programming, executor pools, futures, promises, and event-loop-based frameworks. The execution of a single request may start on one thread, then continue on another, then finish on the third. In the case of event loop systems like Node.js, all requests are processed on a single thread but when the execution tries to make an I/O, it is put in a wait state and when the I/O is done, the execution resumes after waiting its turn in the queue. Both of these asynchronous concurrency models result in each thread switching between multiple different requests that are all in flight. Observing the behavior of such a system from the logs is very difficult, unless we annotate all logs with some kind of unique id representing the request rather than the thread, a technique that actually gets us close to how distributed tracing works. Finally, microservices introduced what we can call "distributed concurrency." Not only can the execution of a single request jump between threads, but it can also jump between processes, when one microservice makes a network call to another. Trying to troubleshoot request execution from such logs is like debugging without a stack trace: we get small pieces, but no big picture. In order to reconstruct the flight of the request from the many log streams, we need powerful logs aggregation technology and a distributed context propagation capability to tag all those logs in different processes with a unique request id that we can use to stitch those requests together. We might as well be using the real distributed tracing infrastructure at this point! Yet even after tagging the logs with a unique request id, we still cannot assemble them into an accurate sequence, because the timestamps from different servers are generally not comparable due to clock skews. In this article we looked at the concept of observability and some challenges one has to face in microservices. We further discussed traditional monitoring tools for microservices. Applying distributed tracing to microservices-based architectures will be easy with Mastering Distributed Tracing written by Yuri Shkuro. 6 Ways to blow up your Microservices! Have Microservices killed the monolithic architecture? Maybe not! How to build Dockers with microservices  
Read more
  • 0
  • 0
  • 4463

article-image-why-moving-from-a-monolithic-architecture-to-microservices-is-so-hard-gitlabs-jason-plum-breaks-it-down-kubeconcnc-talk
Amrata Joshi
19 Dec 2018
12 min read
Save for later

Why moving from a monolithic architecture to microservices is so hard, Gitlab’s Jason Plum breaks it down [KubeCon+CNC Talk]

Amrata Joshi
19 Dec 2018
12 min read
Last week, at the KubeCon+CloudNativeCon North America 2018, Jason Plum, Sr. software engineer, distribution at GitLab spoke about GitLab, Omnibus, and the concept of monolith and its downsides. He spent the last year working on the cloud native helm charts and breaking out a complicated pile of code. This article highlights few insights from Jason Plum’s talk on Monolith to Microservice: Pitchforks Not Included at the KubeCon + CloudNativeCon. Key takeaways “You could not have seen the future that you live in today, learn from what you've got in the past, learn what's available now and work your way to it.” - Jason Plum GitLab’s beginnings as the monolithic project provided the means for focused acceleration and innovation. The need to scale better and faster than the traditional models caused to reflect on our choices, as we needed to grow beyond the current architecture to keep up. New ways of doing things require new ways of looking at them. Be open minded, and remember your correct choices in the past could not see the future you live in. “So the real question people don't realize is what is GitLab?”- Jason Plum Gitlab is the first single application to have the entire DevOps lifecycle in a single Interface. Omnibus - The journey from a package to a monolith “We had a group of people working on a single product to binding that and then we took that, we bundled that. And we shipped it and we shipped it and we shipped it and we shipped it and all the twenties every month for the entire lifespan of this company we have done that, that's not been easy. Being a monolith made that something that was simple to do at scale.”- Jason Plum In the beginning it was simple as Ruby on Rails was on a single codebase and users had to deploy it from source. Just one gigantic code was used but that's not the case these days. Ruby on Rails is still used for the primary application but now a shim proxy called workhorse is used that takes the heavy lifting away from Ruby. It ensures the users and their API’s are are responsive. The team at GitLab started packaging this because doing everything from source was difficult. They created the Omnibus package which eventually became the gigantic monolith. Monoliths make sense because... Adding features is simple It’s easy as everything is one bundle Clear focus for Minimum Viable Product (MVP) Advantages of Omnibus Full-stack bundle provides all components necessary to use every feature of GitLab. Simple to install. Components can be individually enabled/disabled. East to distribute. Highly controlled, version locked components. Guaranteed configuration stability. The downsides of monoliths “The problem is this thing is massive” - Jason Plum The Omnibus package can work on any platform, any cloud and under any distribution. But the question is how many of us would want to manage fleets of VMs? This package has grown so much that it is 1.5 gigabytes and unpacked. It has all the features and is still usable. If a user downloads 500 megabytes as an installation package then it unpacks almost a gigabyte and a half. This package contains everything that is required to run the SaaS but the problem is that this package is massive. “The trick is Git itself is the reason that moving to cloud native was hard.” - Jason Plum While using Git, the users run a couple of commands, they push them and deploy the app. But at the core of that command is how everything is handled and how everything is put together. Git works with snapshots of the entire file. The number of files include, every file the user has and every version the user had. It also involves all the indexes and references and some optimizations. But the problem is the more the files, the harder it gets. “Has anybody ever checked out the Linux tree? You check out that tree, get your coffee, come back check out, any branch I don't care what it is and then dip that against current master. How many files just got read on the file system?” - Jason Plum When you come back you realize that all the files that are marked as different and between the two of them when you do diff, that information is not stored, it's not greeting and it is not even cutting it out. It is running differently on all of those files. Imagine how bad that gets when you have 10 million lines of code in a repository that's 15 years old ?  That’s expensive in terms of performance.  - Jason Plum Traditional methods - A big problem “Now let's actually go and make a branch make some changes and commit them right. Now you push them up to your fork and now you go into add if you on an M R. Now it's my job to do the thing that was already hard on your laptop, right? Okay cool, that's one of you, how about 10,000 people a second right do you see where this is going? Suddenly it's harder but why is this the problem?” - Jason Plum The answer is traditional methods, as they are quite slow. If we have hundreds of things in the fleet, accessing tens of machines that are massive and it still won’t work because the traditional methods are a problem. Is NFS a solution to this problem? NFS (Network File System) works well when there are just 10 or 100 people. But if a user is asked to manage an NFS server for 5,000 people, one might rather choose pitchfork. NFS is capable but it can’t work at such a scale. The Git team now has a mount that has to be on every single node, as the API code and web code and other processes which needs to be functional enough to read the files. The team has previously used Garrett, Lib Git to read the files on the file system. Every time, one reads the file, the whole file used to get pulled. This gave rise to another problem, disk i/o problems. Since, everybody tries to read the disparate set of files, the traffic increases. “Okay so we have definitely found a scaling limit now we can only push the traditional methods of up and out so far before we realize that that's just not going to work because we don't have big enough pipes, end of line. So now we've got all of this and we've just got more of them and more of them and more of them. And all of a sudden we need to add 15 nodes to the fleet and another 15 nodes to the fleet and another 15 nodes to the fleet to keep up with sudden user demand. With every single time we have to double something the choke points do not grow - they get tighter and tighter” - Jason Plum The team decided to take a second look at the problem and started working on a project called Gitaly. They took the API calls that the users would make to live Git. So the Git mechanics was sent over a GRPC and then Gitaly was put on the actual file servers. Further the users were asked to call for a diff on whatever they want and then Gitaly was asked for the response. There is no need of NFS now. “I can send a 1k packet get a 4k response instead of NFS and reading 10,000 files. We centralized everything across and this gives us the ability to actually meet throughput because that pipe that's not getting any bigger suddenly has 1/10 of the traffic going through it.” - Jason Plum This leaves more space for users to easily get to the file servers and further removes the need of NFS mounts for everything. Incase one node is lost then half of the fleet is not lost in an instant. How is Gitaly useful? With Gitaly the throughput requirement significantly reduced. The service nodes no more need disk access. It provides optimization for specific problems. How to solve Git’s performance related issue? For better optimization and performance it is important to treat it like a service or like a database. The file system is still in use and all of the accesses to the files are on the node where we have the best performance and best caching and there is no issue with regards to the network. “To take the monolith and rip a chunk out make it something else and literally prop the thing up, but how long are we going to be able to do this?” - Jason Plum If a user plans to upload something then he/she has to use a file system and which means that NFS hasn't gone away. Do we really need to have NFS because somebody uploaded a cat picture? Come on guys we can do better than that right?- Jason Plum The next solution was to take everything as a traditional file that does not get and move into object store as an option. This matters because there is no need to have a file system locally. The files can be handed over to a service that works well. And it could run on Prem in a cloud and can be handled by any number of men and service providers. Pets cattle is a popular term by CERN which means anything that can be replaced easily is cattle and anything that you have to care and feed for on a regular basis is a pet. The pet could be the stateful information, for example, database. The problem can be better explained with configuring the Omnibus at scale. If there are  hundreds of the VM’s and they are getting installed, further which the entire package is getting installed. So now there are 20 gigabytes per VM. The package needs to be downloaded for all the VM’s which means almost 500 megabytes. All the individual components can be configured out of the Omnibus. But even the load gets spreaded, it will still remain this big. And each of the nodes will at least take two minutes to come up from. So to speed up this process, the massive stack needs to be broken down into chunks and containers so they can be treated as individualized services. Also, there is no need of NFS as the components are no longer bound to the NFS disk. And this process would now take just five seconds instead of two minutes. A problem called legacy debt, a shared file system expectation which was a bugger. If there are separate containers and there is no shared disk then it could again give rise to a problem. “I can't do a shared disk because if we do shared disk through rewrite many. What's the major provider that will do that for us on every platform, anybody remember another three-letter problem.” - Jason Plum Then there came an interesting problem called workhorse, a smart proxy that talks to the UNIX sockets and not TCP. Though this problem got fixed. Time constraints - another problem “We can't break existing users and we can't have hiccups we have to think about everything ahead of time plan well and execute.” - Jason Plum Time constraints is a serious problem for a project’s developers, the development resources milestones, roadmaps deliverables. The new features would keep on coming into the project. The project would keep on functioning in the background but the existing users can’t be kept waiting. Is it possible to define individual component requirements? “Do you know how much CPU you need when idle versus when there's 10 people versus literally some guy clicking around and if files because he's one to look at what the kernel would like in 2 6 2 ?”- Jason Plum Monitoring helps to understand the component requirements. Metrics and performance data are few of the key elements for getting the exact component requirements. Other parameters like network, throughput, load balance, services etc also play an important role. But the problem is how to deal with throughput? How to balance the services? How to ensure that those services are always up? Then the other question comes up regarding the providers and load balancers as everyone doesn’t want to use the same load balancers or the same services. The system must support all the load balancers from all the major cloud providers and which is difficult. Issues with scaling “Maybe 50 percent for the thing that needs a lot of memory is a bad idea. I thought 50 percent was okay because when I ran a QA test against it, it didn't ever use more than 50 percent of one CPU. Apparently when I ran three more it now used 115 percent and I had 16 pounds and it fell over again.” - Jason Plum It's important to know what things needs to be scaled horizontally and which ones needs to be scaled vertically. To go automated or manual is also a crucial question. Also, it is equally important to understand which things should be configurable and how to tweak them as the use cases may vary from project to project. So, one should know how to go about a test and how to document a test. Issues with resilience “What happens to the application when a node, a whole node disappears off the cluster? Do you know how that behaves?” - Jason Plum It is important to understand which things shouldn't be on the same nodes. But the problem is how to recover it. These things are not known and by the time one understands the problem and the solution, it is too late. We need new ways of examining these issues and for planning the solution. Jason’s insightful talk on Monolith to Microservice gives a perfect end to the KubeCon + CloudNativeCon and is a must watch for everyone. Kelsey Hightower on Serverless and Security on Kubernetes at KubeCon + CloudNative RedHat contributes etcd, a distributed key-value store project, to the Cloud Native Computing Foundation at KubeCon + CloudNativeCon Oracle introduces Oracle Cloud Native Framework at KubeCon+CloudNativeCon 2018
Read more
  • 0
  • 0
  • 4722

article-image-have-microservices-killed-monolithic-software-architecture-for-good
Aaron Lazar
04 Jun 2018
6 min read
Save for later

Have Microservices killed the monolithic architecture? Maybe not!

Aaron Lazar
04 Jun 2018
6 min read
Microservices have been growing in popularity since the past few years, 2014 to be precise. Honestly speaking they weren’t that popular until around 2016 - take a look at the steep rise in the curve. The outbreak has happened over the past few years and there are quite a few factors contributing to their growth, like the cloud, distributed architectures, etc. Source: Google Trends Microservices allow for a clearer and refined architecture, with services built to work in isolation, without affecting the resilience and robustness of the application in any way. But does that mean that the Monolith is dead and only Microservices reign? Let’s find out, shall we? Those of you who participated in this year’s survey, I thank you for taking the time out to share such valuable information. For those of you who don’t know what the survey is all about, it a thing that we do every year, where thousands of developers, architects, managers, admins, share their insights with us, and we share our findings with the community. This year’s survey was as informative as the last, if not more! We had developers tell us so much about what they’re doing, where they see technology heading and what tools and techniques they use to stay relevant at what they do. So we took the opportunity and asked our respondents a question about the topic under discussion. Source: WWE.com Revelations If I asked a developer in 2018, what they thought would be the response, they’d instantly say that a majority would be for microservices. Source: Packtpub Skill Up Survey 2018 If you were the one who guessed the answer was going to be Yes, give yourself a firm pat on the back! It’s great to see that 1,603 people are throwing their hands up in the air and building microservices. On the other hand, it’s possible that it’s purely their manager’s decision (See how this forms a barrier to achieving business goals). Anyway, I was particularly concerned about the remaining 314 people who said ‘No’ (those who skipped answering, now is your chance to say something in the comments section below!). Why no Microservices? I thought I’d analyse the possibilities as to why one wouldn’t want to use the microservices pattern in their application architecture. It’s not like developers are migrating from monoliths to microservices, just because everyone else is doing it. Like any other architectural decision, there are several factors that need to be taken into consideration before making the switch. So here’s what I thought were some reasons why developers are sticking to monoliths. #1 One troll vs many elves: Complex times Well imagine you could be attacked by one troll or a hundred house elves. Which situation would you choose to be in if neither isn’t an option? I don’t know about you, but I’d choose the troll any day! Keeping the troll’s size aside, I’d be better off knowing I had one large enemy in front of me, rather than being surrounded by a hundred miniature ones. The same goes for microservices. More services means more complexity, more issues that could crop up. For developers, more services means that they would need to run or connect to all of them on their machine. Although there are tools that help solve this problem, you have to admit that it’s a task to run all services together as a whole application. On the other hand, Ops professionals are tasked to monitor and keep all these services up and running. #2 We lack the expertise Let alone having Developer Rockstars or Admin Ninjas (Oops, I shouldn’t be using those words now, find out why), if your organisation lacks experienced professionals, you’ve got a serious problem. What if there’s an organisation that has been having issues developing/managing a monolith itself. There’s no guarantee that they will be able to manage a microservices based application more effectively. It’s a matter of the organisation having enough hands on skills needed to perform these tasks. These skills are tough to acquire and it’s not simple for organisations to find the right talent. #3 Tower of Babel: Communication gaps In a monolith, communication happens within the application itself and the network channels exist internally. However, this isn’t the case for a microservices architecture as inter-service communication is necessary to keep everything running in tandem. This results in the generation of multiple points of failure, complicating things. To minimise failure, each service has a certain number of retries when trying to establish communication with another. When scaled up, these retries add a load on the database, what with communication formats having to follow strict rules to avoid complexity back again. It’s a vicious circle! #4 Rebuilding a monolith When you build an application based on the microservices architecture, you may benefit a great deal from robustness and reliability. However, microservices together form a large, complicated system, which can be managed by orchestration platforms like Kubernetes. Although, if individual teams are managing clusters of these services, it’s quite likely that orchestration, deployment and management of such a system will be a pain. #5 Burning in dependency hell Microservices are notorious for inviting developers to build services in various languages and then to glue them together. While this is an advantage to a certain extent, it complicates dependency management in the entire application. Moreover, dependencies get even more complicated when versions of tools don’t receive instantaneous support as they are updated. You and your team can go crazy keeping track of versions and dependencies that need to be managed to maintain smooth functioning of your application. So while the microservice architecture is hot, it is not always the best option and teams can actually end up making things worse if they choose to make the change unprepared. Yes, the cloud does benefit much more when applications are deployed as services, rather than as a monolith, but the renowned/infamous “lift and shift” method still exists and works when needed. Ultimately, if you think past the hype, the monolith is not really dead yet and is in fact still being deployed and run in several organisations. Finally, I want to stress that it’s critical that developers and architects take a well informed decision, keeping in mind all the above factors, before they choose an architecture. Like they say, “With great power comes great responsibility”, that’s exactly what great architecture is all about, rather than just jumping on the bandwagon. Building Scalable Microservices Why microservices and DevOps are a match made in heaven What is a multi layered software architecture?
Read more
  • 0
  • 0
  • 5154

article-image-what-domain-driven-design
Packt Editorial Staff
03 Apr 2018
18 min read
Save for later

What is domain driven design?

Packt Editorial Staff
03 Apr 2018
18 min read
Domain driven design exists because all software exists for a purpose. It does something. For example, you can't provide a software solution for a financial system such as online stock trading if you don't understand the stock exchanges and their functioning. Having domain knowledge is essential to solving problems with software. Domain driven design is simply designing software with the specific domain - whether that's finance, medicine, law, eCommerce - in mind. This has been taken from Mastering Microservices with Java 9 - Second Edition. Central to Domain Driven Design is the concept of a model. A model is an abstraction, or a blueprint, of the domain. Domain driven design is a collaborative activity Designing this model is not rocket science, but it does take a lot of effort, refining, and input from domain experts. It is the collective job of software designers, domain experts, and developers. They organize information, divide it into smaller parts, group them logically, and create modules. Each module can be taken up individually, and can be divided using a similar approach. This process can be followed until we reach the unit level, or when we cannot divide it any further. A complex project may have more of such iterations; similarly, a simple project could have just a single iteration of it. Once a model is defined and well documented, it can move onto the next stage - code design. So, here we have a software design—a domain model and code design, and code implementation of the domain model. The domain model provides a high level of the architecture of a solution (software/application), and the code implementation gives the domain model a life, as a working model. Domain Driven Design makes design and development work together. It provides the ability to develop software continuously, while keeping the design up to date based on feedback received from the development. It solves one of the limitations offered by Agile and Waterfall methodologies, making software maintainable, including design and code, as well as keeping application minimum viable. It gives developers the right platform to understand the domain, and provides the opportunity to share early feedback of the domain model implementation. It removes the bottleneck that appears in later stages when stockholders wait for deliverables. The fundamental components of Domain Driven Design To understand domain driven design, you can break it down into 3 fundamental concepts: Ubiquitous language and unified model language (UML) Multilayer architecture Artifacts (components) Ubiquitous language Ubiquitous language is a common language to communicate within a project. It's because designing a model is a collaborative effort of software designers, domain experts, and developers that it requires a common language to communicate with. It removes misunderstandings, misinterpretations. Communication gaps so often lead to bad software - ubiquitous language minimizes these gaps. It does, however, need to be used everywhere on a project. Unified Modeling Language (UML) is widely used and very popular when creating models. It also has a few limitations; for example, when you have thousands of classes drawn from a paper, it's difficult to represent class relationships and simultaneously understand their abstraction while taking a meaning from it. Also, UML diagrams do not represent the concepts of a model and what objects are supposed to do. Therefore, UML should always be used with other documents, code, or any other reference for effective communication. Multilayered architecture Multilayered architecture is a common solution for Domain Driven Design. It contains four layers: Presentation layer or (UI) Application layer - responsible for application logic. It maintains and coordinates the overall flow of the product/service. It does not contain business logic or UI. It may hold the state of application objects, like tasks in progress. Domain layer - contains the domain information and business logic. It holds the state of the business object. Infrastructure layer -  provides support to all the other layers and is responsible for communication between them. To understand the interaction of the different layers, take the example of table booking at a restaurant. The end user places a request for a table booking using UI. The UI passes the request to the application layer. The application layer fetches the domain objects, such as the restaurant, the table, a date, and so on, from the domain layer. The domain layer fetches these existing persisted objects from the infrastructure, and invokes relevant methods to make the booking and persist them back to the infrastructure layer. Once domain objects are persisted, the application layer shows the booking confirmation to the end user. Artifacts used in Domain Driven Design There are seven different artifacts used in Domain Driven Design to express, create, and retrieve domain models: Entities Value objects Services Aggregates Repository Factory Module Entities are certain types of objects that are identifiable and remain the same throughout the states of the products/services. These objects are not identified by their attributes, but by their identity and thread of continuity. These type of objects are known as entities. It sounds pretty simple, but it carries complexity. You need to understand how we can define the entities. Let's take an example of a table booking system, where we have a restaurant class with attributes such as restaurant name, address, phone number, establishment data, and so on. We can take two instances of the restaurant class that are not identifiable using the restaurant name, as there could be other restaurants with the same name. Similarly, if we go by any other single attribute, we will not find any attributes that can singularly identify a unique restaurant. If two restaurants have all the same attribute values, they are therefore the same and are interchangeable with each other. Still, they are not the same entities, as both have different references (memory addresses). Conversely, let's take a class of U.S. citizens. Every U.S. citizen has his or her own social security number. This number is not only unique, but remains unchanged throughout the life of the citizen and assures continuity. This citizen object would exist in the memory, would be serialized, and would be removed from the memory and stored in the database. It even exists after the person is deceased. It will be kept in the system for as long as the system exists. A citizen's social security number remains the same irrespective of its representation. Therefore, creating entities in a product means creating an identity. So, now give an identity to any restaurant in the previous example, then either use a combination of attributes such as restaurant name, establishment date, and street, or add an identifier such as restaurant_id to identify it. The basic rule is that two identifiers cannot be the same. Therefore, when we introduce an identifier for an entity, we need to be sure of it. There are different ways to create a unique identity for objects, described as follows: Using the primary key in a table. Using an automated generated ID by a domain module. A domain program generates the identifier and assigns it to objects that are being persisted among different layers. A few real-life objects carry user-defined identifiers themselves. For example, each country has its own country codes for dialing ISD calls. Composite key. This is a combination of attributes that can also be used for creating an identifier, as explained for the preceding restaurant object. Value objects Value objects (VOs) simplify the design. In contrast to entities, value objects have only attributes and no conceptual identity. A best practice is to keep value objects as immutable objects. If possible, you should even keep entity objects immutable too. You might want to keep all objects as entities, but you're likely to run into problems if you do this; there has to be one instance for each object. Let's say you are creating customers as entity objects. Each customer object would represent the restaurant guest; this cannot be used for booking orders for other guests. This may create millions of customer entity objects in the memory if millions of customers are using the system. Not only are there millions of uniquely identifiable objects that exist in the system, but each object is being tracked. Tracking as well as creating an identity is complex. A highly credible system is required to create and track these objects, which is not only very complex, but also resource heavy. It may result in system performance degradation. Therefore, it is important to use value objects instead of using entities. The reasons are explained in the next few paragraphs. Applications don't always need to have to be trackable and have an identifiable customer object. There are cases when you just need to have some or all attributes of the domain element. These are the cases when value objects can be used by the application. It makes things simple and improves the performance. Value objects can easily be created and destroyed, owing to the absence of identity. This simplifies the design—it makes value objects available for garbage collection if no other object has referenced them. Value objects should be designed and coded as immutable. Once they are created, they should never be modified during their life-cycle. If you need a different value of the VO, or any of its objects, then simply create a new value object, but don't modify the original value object. Here, immutability carries all the significance from object-oriented programming (OOP). A value object can be shared and used without impacting on its integrity if, and only if, it is immutable. Services While creating the domain model, you may come across situations where behavior may not be related to any object. These behaviors can be accommodated in service objects. Service objects are part of the domain layer and do not have any internal state. The sole purpose of service objects is to provide behavior to the domain that does not belong to a single entity or value object. Ubiquitous language helps you to identify different objects, identities, or value objects with different attributes and behaviors during the process of domain driven design and domain modelling. During the course of creating the domain model, you may find different behaviors or methods that do not belong to any specific object. Such behaviors are important, and so cannot be neglected. Neither can you add them to entities or value objects. It would spoil the object to add behavior that does not belong to it. Keep in mind, that behavior may impact on various objects. The use of object-oriented programming makes it possible to attach to some objects; this is known as a service. Services are common in technical frameworks. These are also used in domain layers in domain driven design. A service object does not have any internal state; its only purpose is to provide a behavior to the domain. Service objects provide behaviors that cannot be related to specific entities or value objects. Service objects may provide one or more related behaviors to one or more entities or value objects. It is a practice to define the services explicitly in the domain model. While creating the services, you need to tick all of the following points: Service objects' behavior performs on entities and value objects, but it does not belong to entities or value objects Service objects' behavior state is not maintained, and hence, they are stateless Services are part of the domain model Services may also exist in other layers. It is very important to keep domain-layer services isolated. It removes the complexities and keeps the design decoupled. Let's take an example where a restaurant owner wants to see the report of his monthly table bookings. In this case, he will log in as an admin and click the Display Report button after providing the required input fields, such as duration. Application layers pass the request to the domain layer that owns the report and templates objects, with some parameters such as report ID, and so on. Reports get created using the template, and data is fetched from either the database or other sources. Then the application layer passes through all the parameters, including the report ID to the business layer. Here, a template needs to be fetched from the database or another source to generate the report based on the ID. This operation does not belong to either the report object or the template object. Therefore, a service object is used that performs this operation to retrieve the required template from the database. Aggregates Aggregate domain pattern is related to the object's life cycle. It defines ownership and boundaries which is crucial in Domain Driven Design When you reserve a table at your favorite restaurant online using an application, you don't need to worry about the internal system and process that takes place to book your reservation, including searching for available restaurants, then for available tables on the given date, time, and so on and so forth. Therefore, you can say that a reservation application is an aggregate of several other objects, and works as a root for all the other objects for a table reservation system. This root should be an entity that binds collections of objects together. It is also called the aggregate root. This root object does not pass any reference of inside objects to external worlds, and protects the changes performed within internal objects. We need to understand why aggregators are required. A domain model can contain large numbers of domain objects. The bigger the application functionalities and size and the more complex its design, the greater number of objects present. A relationship exists between these objects. Some may have a many-to-many relationship, a few may have a one-to-many relationship, and others may have a one-to-one relationship. These relationships are enforced by the model implementation in the code, or in the database that ensures that these relationships among the objects are kept intact. Relationships are not just unidirectional; they can also be bidirectional. They can also increase in complexity. The designer's job is to simplify these relationships in the model. Some relationships may exist in a real domain, but may not be required in the domain model. Designers need to ensure that such relationships do not exist in the domain model. Similarly, multiplicity can be reduced by these constraints. One constraint may do the job where many objects satisfy the relationship. It is also possible that a bidirectional relationship could be converted into a unidirectional relationship. No matter how much simplification you input, you may still end up with relationships in the model. These relationships need to be maintained in the code. When one object is removed, the code should remove all the references to this object from other places. For example, a record removal from one table needs to be addressed wherever it has references in the form of foreign keys and such, to keep the data consistent and maintain its integrity. Also, invariants (rules) need to be forced and maintained whenever data changes. Relationships, constraints, and invariants bring a complexity that requires an efficient handling in code. We find the solution by using the aggregate represented by the single entity known as the root, which is associated with the group of objects that maintains consistency with regards to data changes. This root is the only object that is accessible from outside, so this root element works as a boundary gate that separates the internal objects from the external world. Roots can refer to one or more inside objects, and these inside objects can have references to other inside objects that may or may not have relationships with the root. However, outside objects can also refer to the root, and not to any inside objects. An aggregate ensures data integrity and enforces the invariant. Outside objects cannot make any change to inside objects; they can only change the root. However, they can use the root to make a change inside the object by calling exposed operations. The root should pass the value of inside objects to outside objects if required. If an aggregate object is stored in the database, then the query should only return the aggregate object. Traversal associations should be used to return the object when it is internally linked to the aggregate root. These internal objects may also have references to other aggregates. An aggregate root entity holds its global identity, and holds local identities inside their entities. A simple example of an aggregate in the table booking system is the customer. Customers can be exposed to external objects, and their root object contains their internal object address and contact information. When requested, the value object of internal objects, such as address, can be passed to external objects: Repository In a domain model, at a given point in time, many domain objects may exist. Each object may have its own life-cycle, from the creation of objects to their removal or persistence. Whenever any domain operation needs a domain object, it should retrieve the reference of the requested object efficiently. It would be very difficult if you didn't maintain all of the available domain objects in a central object. A central object carries the references of all the objects, and is responsible for returning the requested object reference. This central object is known as the repository. The repository is a point that interacts with infrastructures such as the database or file system. A repository object is the part of the domain model that interacts with storage such as the database, external sources, and so on, to retrieve the persisted objects. When a request is received by the repository for an object's reference, it returns the existing object's reference. If the requested object does not exist in the repository, then it retrieves the object from storage. For example, if you need a customer, you would query the repository object to provide the customer with ID 31. The repository would provide the requested customer object if it is already available in the repository, and if not, it would query the persisted stores such as the database, fetch it, and provide its reference. The main advantage of using the repository is having a consistent way to retrieve objects where the requestor does not need to interact directly with the storage such as the database. A repository may query objects from various storage types, such as one or more databases, filesystems, or factory repositories, and so on. In such cases, a repository may have strategies that also point to different sources for different object types As you can see in the repository object flow diagram on the right, the repository interacts with the infrastructure layer, and this interface is part of the domain layer. The requestor may belong to a domain layer, or an application layer. The repository helps the system to manage the life cycle of domain objects. Factory A factory is required when a simple constructor is not enough to create the object. It helps to create complex objects, or an aggregate that involves the creation of other related objects. A factory is also a part of the life cycle of domain objects, as it is responsible for creating them. Factories and repositories are in some way related to each other, as both refer to domain objects. The factory refers to newly created objects, whereas the repository returns the already existing objects either from the memory, or from external storage. Let's see how control flows, by using a user creation process application. Let's say that a user signs up with a username user1. This user creation first interacts with the factory, which creates the name user1 and then caches it in the domain using the repository, which also stores it in the storage for persistence. When the same user logs in again, the call moves to the repository for a reference. This uses the storage to load the reference and pass it to the requestor. The requestor may then use this user1 object to book the table in a specified restaurant, and at a specified time. These values are passed as parameters, and a table booking record is created in storage using the repository:       The factory may use one of the object-oriented programming patterns, such as the factory or abstract factory pattern, for object creation. Modules Modules are the best way to separate related business objects. These are best suited to large projects where the size of domain objects is bigger. For the end user, it makes sense to divide the domain model into modules and set the relationship between these modules. Once you understand the modules and their relationship, you start to see the bigger picture of the domain model, thus it's easier to drill down further and understand the model. Modules also help you to write code that is highly cohesive, or maintains low coupling. Ubiquitous language can be used to name these modules. For the table booking system, we could have different modules, such as user-management, restaurants and tables, analytics and reports, and reviews, and so on. This introduction to domain driven design should give you a strong foundation for using it when you build software. It's principles are useful - in particular, making sure you collaborate and use the same language as different stakeholders is one of domain driven design's most valuable contributions to the way we approach software development.
Read more
  • 0
  • 0
  • 7876

article-image-api-gateway-and-its-need
Packt
21 Feb 2018
9 min read
Save for later

API Gateway and its Need

Packt
21 Feb 2018
9 min read
 In this article by Umesh R Sharma, author of the book Practical Microservices, we will cover API Gateway and its need with simple and short examples. (For more resources related to this topic, see here.) Dynamic websites show a lot on a single page, and there is a lot of information that needs to be shown on the page. The common success order summary page shows the cart detail and customer address. For this, frontend has to fire a different query to the customer detail service and order detail service. This is a very simple example of having multiple services on a single page. As a single microservice has to deal with only one concern, in result of that to show much information on page, there are many API calls on the same page. So, a website or mobile page can be very chatty in terms of displaying data on the same page. Another problem is that, sometimes, microservice talks on another protocol, then HTTP only, such as thrift call and so on. Outer consumers can't directly deal with microservice in that protocol. As a mobile screen is smaller than a web page, the result of the data required by the mobile or desktop API call is different. A developer would want to give less data to the mobile API or have different versions of the API calls for mobile and desktop. So, you could face a problem such as this: each client is calling different web services and keeping track of their web service and developers have to give backward compatibility because API URLs are embedded in clients like in mobile app. Why do we need the API Gateway? All these preceding problems can be addressed with the API Gateway in place. The API Gateway acts as a proxy between the API consumer and the API servers. To address the first problem in that scenario, there will only be one call, such as /successOrderSummary, to the API Gateway. The API Gateway, on behalf of the consumer, calls the order and user detail, then combines the result and serves to the client. So basically, it acts as a facade or API call, which may internally call many APIs. The API Gateway solves many purposes, some of which are as follows. Authentication API Gateways can take the overhead of authenticating an API call from outside. After that, all the internal calls remove security check. If the request comes from inside the VPC, it can remove the check of security, decrease the network latency a bit, and make the developer focus more on business logic than concerning about security. Different protocol Sometimes, microservice can internally use different protocols to talk to each other; it can be thrift call, TCP, UDP, RMI, SOAP, and so on. For clients, there can be only one rest-based HTTP call. Clients hit the API Gateway with the HTTP protocol and the API Gateway can make the internal call in required protocol and combine the results in the end from all web service. It can respond to the client in required protocol; in most of the cases, that protocol will be HTTP. Load-balancing The API Gateway can work as a load balancer to handle requests in the most efficient manner. It can keep a track of the request load it has sent to different nodes of a particular service. Gateway should be intelligent enough to load balances between different nodes of a particular service. With NGINX Plus coming into the picture, NGINX can be a good candidate for the API Gateway. It has many of the features to address the problem that is usually handled by the API Gateway. Request dispatching (including service discovery) One main feature of the gateway is to make less communication between client and microservcies. So, it initiates the parallel microservices if that is required by the client. From the client side, there will only be one hit. Gateway hits all the required services and waits for the results from all services. After obtaining the response from all the services, it combines the result and sends it back to the client. Reactive microservice designs can help you achieve this. Working with service discovery can give many extra features. It can mention which is the master node of service and which is the slave. Same goes for DB in case any write request can go to the master or read request can go to the slave. This is the basic rule, but users can apply so many rules on the basis of meta information provided by the API Gateway. Gateway can record the basic response time from each node of service instance. For higher priority API calls, it can be routed to the fastest responding node. Again, rules can be defined on the basis of the API Gateway you are using and how it will be implemented. Response transformation Being a first and single point of entry for all API calls, the API Gateway knows which type of client is calling a mobile, web client, or other external consumer; it can make the internal call to the client and give the data to different clients as per needs and configuration. Circuit breaker To handle the partial failure, the API Gateway uses a technique called circuit breaker pattern. A service failure in one service can cause the cascading failure in the flow to all the service calls in stack. The API Gateway can keep an eye on some threshold for any microservice. If any service passes that threshold, it marks that API as open circuit and decides not to make the call for a configured time. Hystrix (by Netflix) served this purpose efficiently. Default value in this is failing of 20 requests in 5 seconds. Developers can also mention the fall back for this open circuit. This fall back can be of dummy service. Once API starts giving results as expected, then gateway marks it as a closed service again. Pros and cons of API Gateway Using the API Gateway itself has its own pros and cons. In the previous section, we have described the advantages of using the API Gateway already. I will still try to make them in points as the pros of the API Gateway. Pros Microservice can focus on business logic Clients can get all the data in a single hit Authentication, logging, and monitoring can be handled by the API Gateway Gives flexibility to use completely independent protocols in which clients and microservice can talk It can give tailor-made results, as per the clients needs It can handle partial failure Addition to the preceding mentioned pros, some of the trade-offs are also to use this pattern. Cons It can cause performance degrade due to lots of happenings on the API Gateway With this, discovery service should be implemented Sometimes, it becomes the single point of failure Managing routing is an overhead of the pattern Adding additional network hope in the call Overall. it increases the complexity of the system Too much logic implementation in this gateway will lead to another dependency problem So, before using the API Gateway, both of the aspects should be considered. Decision of including the API Gateway in the system increases the cost as well. Before putting effort, cost, and management in this pattern, it is recommended to analysis how much you can gain from it. Example of API Gateway In this example, we will try to show only sample product pages that will fetch the data from service product detail to give information about the product. This example can be increased in many aspects. Our focus of this example is to only show how the API Gateway pattern works; so we will try to keep this example simple and small. This example will be using Zuul from Netflix as an API Gateway. Spring also had an implementation of Zuul in it, so we are creating this example with Spring Boot. For a sample API Gateway implementation, we will be using http://start.spring.io/ to generate an initial template of our code. Spring initializer is the project from Spring to help beginners generate basic Spring Boot code. A user has to set a minimum configuration and can hit the Generate Project button. If any user wants to set more specific details regarding the project, then they can see all the configuration settings by clicking on the Switch to the full version button, as shown in the following screenshot: Let's create a controller in the same package of main application class and put the following code in the file: @SpringBootApplication @RestController public class ProductDetailConrtoller { @Resource ProductDetailService pdService; @RequestMapping(value = "/product/{id}") public ProductDetail getAllProduct( @PathParam("id") String id) { return pdService.getProductDetailById(id); } }   In the preceding code, there is an assumption of the pdService bean that will interact with Spring data repository for product detail and get the result for the required product ID. Another assumption is that this service is running on port 10000. Just to make sure everything is running, a hit on a URL such as http://localhost:10000/product/1 should give some JSON as response. For the API Gateway, we will create another Spring Boot application with Zuul support. Zuul can be activated by just adding a simple @EnableZuulProxy annotation. The following is a simple code to start the simple Zuul proxy: @SpringBootApplication @EnableZuulProxy public class ApiGatewayExampleInSpring { public static void main(String[] args) { SpringApplication.run(ApiGatewayExampleInSpring.class, args); } }   Rest all the things are managed in configuration. In the application.properties file of the API Gateway, the content will be something as follows: zuul.routes.product.path=/product/** zuul.routes.produc.url=http://localhost:10000 ribbon.eureka.enabled=false server.port=8080  With this configuration, we are defining rules such as this: for any request for a URL such as /product/xxx, pass this request to http://localhost:10000. For outer world, it will be like http://localhost:8080/product/1, which will internally be transferred to the 10000 port. If we defined a spring.application.name variable as product in product detail microservice, then we don't need to define the URL path property here (zuul.routes.product.path=/product/** ), as Zuul, by default, will make it a URL/product. The example taken here for an API Gateway is not very intelligent, but this is a very capable API Gateway. Depending on the routes, filter, and caching defined in the Zuul's property, one can make a very powerful API Gateway. Summary In this article, you learned about the API Gateway, its need, and its pros and cons with the code example. Resources for Article:   Further resources on this subject: What are Microservices? [article] Microservices and Service Oriented Architecture [article] Breaking into Microservices Architecture [article]
Read more
  • 0
  • 0
  • 10336

article-image-understanding-microservices
Packt
22 Jun 2017
19 min read
Save for later

Understanding Microservices

Packt
22 Jun 2017
19 min read
This article by Tarek Ziadé, author of the book Python Microservices Development explains the benefits and implementation of microservices with Python. While the microservices architecture looks more complicated than its monolithic counterpart, its advantages are multiple. It offers the following benefits. (For more resources related to this topic, see here.) Separation of concerns First of all, each microservice can be developed independently by a separate team. For instance, building a reservation service can be a full project on its own. The team in charge can make it in whatever programming language and database, as long as it has a well-documented HTTP API. That also means the evolution of the app is more under control than with monoliths. For example, if the payment system changes its underlying interactions with the bank, the impact is localized inside that service and the rest of the application stays stable and under control. This loose coupling improves a lot the overall project velocity as we're applying at the service level a similar philosophy than the single responsibility principle. The single responsibility principle was defined by Robert Martin to explain that a class should have only one reason to change - in other words, each class should be providing a single, well-defined feature. Applied to microservices, it means that we want to make sure that each microservice focuses on a single role. Smaller projects The second benefit is breaking the complexity of the project. When you are adding a feature to an application like the PDF reporting, even if you are doing it cleanly, you are making the base code bigger, more complicated and sometimes slower. Building that feature in a separate application avoids this problem, and makes it easier to write it with whatever tools you want. You can refactor it often and shorten your release cycles, and stay on the top of things. The growth of the application remains under your control. Dealing with a smaller project also reduces risks when improving the application: if a team wants to try out the latest programming language or framework, they can iterate quickly on a prototype that implements the same microservice API, try it out, and decide whether or not to stick with it. One real-life example in mind is the Firefox Sync storage microservice. There are currently some experiments to switch from the current Python+MySQL implementation to a Go based one that stores users data in standalone SQLite databases. That prototype is highly experimental, but since we have isolated the storage feature in a microservice with a well-defined HTTP API, it's easy enough to give it a try with a small subset of the user base. Scaling and deployment Last, having your application split into components makes it easier to scale depending on your constraints. Let's say you are starting to get a lot of customers that are booking hotels daily, and the PDF generation is starting to heat up the CPUs. You can deploy that specific microservice in some servers that have bigger CPUs. Another typical example is RAM-consuming microservices like the ones that are interacting with memory databases like Redis or Memcache. You could tweak your deployments consequently by deploying them on servers with less CPU and a lot more RAM. To summarize microservices benefits: A team can develop each microservice independently, and use whatever technological stack makes sense. They can define a custom release cycle. The tip of the iceberg is its language agnostic HTTP API. Developers break the application complexity into logical components. Each microservice focuses on doing one thing well. Since microservices are standalone applications, there's a finer control on deployments, which makes scaling easier. Microservices architectures are good at solving a lot of the problems that may arise once your application is starting to grow. Although, we need to be aware of some of the new issues they also bring in practice. Implementing microservices with Python Python is an amazingly versatile language. As you probably already know, it's used to build many different kinds of applications, from simple system scripts that perform tasks on a server, to large object-oriented applications that run services for millions of users. According to a study conducted by Philip Guo in 2014, published in the Association for Computing Machinery (ACM) website, Python has surpassed Java in top U.S. universities and is the most popular language to learn Computer Science. This trend is also true in the software industry. Python sits now in the top 5 languages in the TIOBE index (http://www.tiobe.com/tiobe-index/), and it's probably even bigger in the web development land since languages like C are rarely used as main languages to build web applications. However, some developers criticize Python for being slow and unfit for building efficient web services. Python is slow, and this is undeniable. But it's still is a language of choice for building microservices, and many major companies are happily using it. This section will give you some background on the different ways you can write microservices using Python, some insights on asynchronous versus synchronous programming, and conclude with some details on Python performances. It's composed of 4 parts: The WSGI standard Greenlet & Gevent Twisted & Tornado asyncio Language performances The WSGI standard What strikes the most web developers that are starting with Python is how easy it is to get a web application up and running. The Python web community has created a standard inspired from the Common Gateway Interface (CGI) called Web Server Gateway Interface (WSGI) that simplifies a lot how you can write a Python application which goal is to serve HTTP requests. When your code is using that standard, your project can be executed by standard web servers like Apache or NGinx, using WSGI extensions like uwsgi or mod_wsgi. Your application just has to deal with incoming requests and send back JSON responses, and Python includes all that goodness in its standard library. You can create a fully functional microservice that returns the server's local time with a vanilla Python module of fewer than ten lines: import JSON import time def application(environ, start_response): headers = [('Content-type', 'application/json')] start_response('200 OK', headers) return bytes(json.dumps({'time': time.time()}), 'utf8') Since its introduction, the WSGI protocol became an essential standard and the Python web community widely adopted it. Developers wrote middlewares, which are functions you can hook before or after the WSGI application function itself, to do something within the environment. Some web frameworks were created specifically around that standard, like Bottle (http://bottlepy.org) - and soon enough, every framework out there could be used through WSGI in a way or another. The biggest problem with WSGI though is its synchronous nature. The application function you see above is called exactly once per incoming request, and when the function returns, it has to send back the response. That means that every time you are calling the function, it will block until the response is ready. And writing microservices means your code will be waiting for responses from various network resources all the time. In other words, your application will idle and just block the client until everything is ready. That's an entirely okay behavior for HTTP APIs. We're not talking about building bidirectional applications like web socket based ones. But what happens when you have several incoming requests that are calling your application at the same time? WSGI servers will let you run a pool of threads to serve several requests concurrently. But you can't run thousands of them, and as soon as the pool is exhausted, the next request will be blocking even if your microservice is doing nothing but idling and waiting for backend services responses. That's one of the reasons why non-WSGI frameworks like Twisted, Tornado and in Javascript land Node.js became very successful - it's fully async. When you're coding a Twisted application, you can use callbacks to pause and resume the work done to build a response. That means you can accept new requests and start to treat them. That model dramatically reduces the idling time in your process. It can serve thousands of concurrent requests. Of course, that does not mean the application will return each single response faster. It just means one process can accept more concurrent requests and juggle between them as the data is getting ready to be sent back. There's no simple way with the WSGI standard to introduce something similar, and the community has debated for years to come up with a consensus - and failed. The odds are that the community will eventually drop the WSGI standard for something else. In the meantime, building microservices with synchronous frameworks is still possible and completely fine if your deployments take into account the one request == one thread limitation of the WSGI standard. There's, however, one trick to boost synchronous web applications: greenlets. Greenlet & Gevent The general principle of asynchronous programming is that the process deals with several concurrent execution contexts to simulate parallelism. Asynchronous applications are using an event loop that pauses and resumes execution contexts when an event is triggered - only one context is active, and they take turns. Explicit instruction in the code will tell the event loop that this is where it can pause the execution. When that occurs, the process will look for some other pending work to resume. Eventually, the process will come back to your function and continue it where it stopped - moving from an execution context to another is called switching. The Greenlet project (https://github.com/python-greenlet/greenlet) is a package based on the Stackless project, a particular CPython implementation, and provides greenlets. Greenlets are pseudo-threads that are very cheap to instantiate, unlike real threads, and that can be used to call python functions. Within those functions, you can switch and give back the control to another function. The switching is done with an event loop and allows you to write an asynchronous application using a Thread-like interface paradigm. Here's an example from the Greenlet documentation def test1(x, y): z = gr2.switch(x+y) print z def test2(u): print u gr1.switch(42) gr1 = greenlet(test1) gr2 = greenlet(test2) gr1.switch("hello", " world") The two greenlets are explicitly switching from one to the other. For building microservices based on the WSGI standard, if the underlying code was using greenlets we could accept several concurrent requests and just switch from one to another when we know a call is going to block the request - like performing a SQL query. Although, switching from one greenlet to another has to be done explicitly, and the resulting code can quickly become messy and hard to understand. That's where Gevent can become very useful. The Gevent project (http://www.gevent.org/) is built on the top of Greenlet and offers among other things an implicit and automatic way of switching between greenlets. It provides a cooperative version of the socket module that will use greenlets to automatically pause and resume the execution when some data is made available in the socket. There's even a monkey patch feature that will automatically replace the standard lib socket with Gevent's version. That makes your standard synchronous code magically asynchronous every time it uses sockets - with just one extra line. from gevent import monkey; monkey.patch_all() def application(environ, start_response): headers = [('Content-type', 'application/json')] start_response('200 OK', headers) # ...do something with sockets here... return result This implicit magic comes with a price, though. For Gevent to work well, all the underlying code needs to be compatible with the patching Gevent is doing. Some packages from the community will continue to block or even have unexpected results because of this. In particular, if they use C extensions and bypass some of the features of the standard library Gevent patched. But for most cases, it works well. Projects that are playing well with Gevent are dubbed "green," and when a library is not functioning well, and the community asks its authors to "make it green," it usually happens. That's what was used to scale the Firefox Sync service at Mozilla for instance. Twisted and Tornado If you are building microservices where increasing the number of concurrent requests you can hold is important, it's tempting to drop the WSGI standard and just use an asynchronous framework like Tornado (http://www.tornadoweb.org/) or Twisted (https://twistedmatrix.com/trac/). Twisted has been around for ages. To implement the same microservices you need to write a slightly more verbose code: import time from twisted.web import server, resource from twisted.internet import reactor, endpoints class Simple(resource.Resource): isLeaf = True def render_GET(self, request): request.responseHeaders.addRawHeader(b"content-type", b"application/json") return bytes(json.dumps({'time': time.time()}), 'utf8') site = server.Site(Simple()) endpoint = endpoints.TCP4ServerEndpoint(reactor, 8080) endpoint.listen(site) reactor.run() While Twisted is an extremely robust and efficient framework, it suffers from a few problems when building HTTP microservices: You need to implement each endpoint in your microservice with a class derived from a Resource class, and that implements each supported method. For a few simple APIs, it adds a lot of boilerplate code. Twisted code can be hard to understand & debug due to its asynchronous nature. It's easy to fall into callback hell when you're chaining too many functions that are getting triggered successively one after the other - and the code can get messy Properly testing your Twisted application is hard, and you have to use Twisted-specific unit testing model. Tornado is based on a similar model but is doing a better job in some areas. It has a lighter routing system and does everything possible to make the code closer to plain Python. Tornado is also using a callback model, so debugging can be hard. But both frameworks are working hard at bridging the gap to rely on the new async features introduced in Python 3. asyncio When Guido van Rossum started to work on adding async features in Python 3, part of the community pushed for a Gevent-like solution because it made a lot of sense to write applications in a synchronous, sequential fashion - rather than having to add explicit callbacks like in Tornado or Twisted. But Guido picked the explicit technique and experimented in a project called Tulip that Twisted inspired. Eventually, asyncio was born out of that side project and added into Python. In hindsight, implementing an explicit event loop mechanism in Python instead of going the Gevent way makes a lot of sense. The way the Python core developers coded asyncio and how they elegantly extended the language with the async and await keywords to implement coroutines, made asynchronous applications built with vanilla Python 3.5+ code look very elegant and close to synchronous programming. By doing this, Python did a great job at avoiding the callback syntax mess we sometimes see in Node.js or Twisted (Python 2) applications. And beyond coroutines, Python 3 has introduced a full set of features and helpers in the asyncio package to build asynchronous applications, see https://docs.python.org/3/library/asyncio.html. Python is now as expressive as languages like Lua to create coroutine-based applications, and there are now a few emerging frameworks that have embraced those features and will only work with Python 3.5+ to benefit from this. KeepSafe's aiohttp (http://aiohttp.readthedocs.io) is one of them, and building the same microservice, fully asynchronous, with it would simply be these few elegant lines. from aiohttp import web import time async def handle(request): return web.json_response({'time': time.time()}) if __name__ == '__main__': app = web.Application() app.router.add_get('/', handle) web.run_app(app) In this small example, we're very close to how we would implement a synchronous app. The only hint we're async is the async keyword marking the handle function as being a coroutine. And that's what's going to be used at every level of an async Python app going forward. Here's another example using aiopg - a Postgresql lib for asyncio. From the project documentation: import asyncio import aiopg dsn = 'dbname=aiopg user=aiopg password=passwd host=127.0.0.1' async def go(): pool = await aiopg.create_pool(dsn) async with pool.acquire() as conn: async with conn.cursor() as cur: await cur.execute("SELECT 1") ret = [] async for row in cur: ret.append(row) assert ret == [(1,)] loop = asyncio.get_event_loop() loop.run_until_complete(go()) With a few async and await prefixes, the function that's performing a SQL query and send back the result looks a lot like a synchronous function. But asynchronous frameworks and libraries based on Python 3 are still emerging, and if you are using asyncio or a framework like aiohttp, you will need to stick with particular asynchronous implementations for each feature you need. If you require using a library that is not asynchronous in your code, using it from your asynchronous code means you will need to go through some extra and challenging work if you want to prevent blocking the event loop. If your microservices are dealing with a limited number of resources, it could be manageable. But it's probably a safer bet at this point (2017) to stick with a synchronous framework that's been around for a while rather than an asynchronous one. Let's enjoy the existing ecosystem of mature packages, and wait until the asyncio ecosystem gets more sophisticated. And there are many great synchronous frameworks to build microservices with Python, like Bottle, Pyramid with Cornice or Flask. Language performances In the previous sections we've been through the two different ways to write microservices - asynchronous vs. synchronous, and whatever technique you are using, the speed of Python is directly impacting the performance of your microservice. Of course, everyone knows Python is slower than Java or Go - but execution speed is not always the top priority. A microservice is often a thin layer of code that is sitting most of its life waiting for some network responses from other services. Its core speed is usually less important than how fast your SQL queries will take to return from your Postgres server because the latter will represent most of the time spent to build the response. But wanting an application that's as fast as possible is legitimate. One controversial topic in the Python community around speeding up the language is how the Global Interpreter Lock (GIL) mutex can ruin performances because multi-threaded applications cannot use several processes. The GIL has good reasons to exist. It protects non thread-safe parts of the CPython interpreter and exists in other languages like Ruby. And all attempts to remove it so far have failed to produce a faster CPython implementation. Larry Hasting is working on a GIL-free CPython project called Gilectomy - https://github.com/larryhastings/gilectomy - its minimal goal is to come up with a GIL-free implementation that can run a single-threaded application as fast as CPython. As of today (2017), this implementation is still slower that CPython. But it's interesting to follow this work and see if it reaches speed parity one day. That would make a GIL-free CPython very appealing. For microservices, besides preventing the usage of multiple cores in the same process, the GIL will slightly degrade performances on high load, because of the system calls overhead introduced by the mutex. Although, all the scrutiny around the GIL had one beneficial impact: some work has been done in the past years to reduce its contention in the interpreter, and in some area, Python performances have improved a lot. But bear in mind that even if the core team removes the GIL, Python is an interpreted language and the produced code will never be very efficient at execution time. Python provides the dis module if you are interested to see how the interpreter decomposes a function. In the example below, the interpreter will decompose a simple function that yields incremented values from a sequence in no less than 29 steps! >>> def myfunc(data): ... for value in data: ... yield value + 1 ... >>> import dis >>> dis.dis(myfunc) 2 0 SETUP_LOOP 23 (to 26) 3 LOAD_FAST 0 (data) 6 GET_ITER >> 7 FOR_ITER 15 (to 25) 10 STORE_FAST 1 (value) 3 13 LOAD_FAST 1 (value) 16 LOAD_CONST 1 (1) 19 BINARY_ADD 20 YIELD_VALUE 21 POP_TOP 22 JUMP_ABSOLUTE 7 >> 25 POP_BLOCK >> 26 LOAD_CONST 0 (None) 29 RETURN_VALUE A similar function written in a statically compiled language will dramatically reduce the number of operations required to produce the same result. There are ways to speed up Python execution, though. One is to write part of your code into compiled code by building C extensions or using a static extension of the language like Cython (http://cython.org/) - but that makes your code more complicated. Another solution, which is the most promising one, is by simply running your application using the PyPy interpreter (http://pypy.org/). PyPy implements a Just-In-Time compiler (JIT). This compiler is directly replacing at run time pieces of Python with machine code that can be directly used by the CPU. The whole trick for the JIT is to detect in real time, ahead of the execution, when and how to do it. Even if PyPy is always a few Python versions behind CPython, it reached a point where you can use it in production, and its performances can be quite amazing. In one of our projects at Mozilla that needs fast execution, the PyPy version was almost as fast as the Go version, and we've decided to use Python there instead. The Pypy Speed Center website is a great place to look at how PyPy compares to CPython - http://speed.pypy.org/ However, if your program uses C extensions, you will need to recompile them for PyPy, and that can be a problem. In particular, if other developers maintain some of the extensions you are using. But if you are building your microservice with a standard set of libraries, the chances are that will it work out of the box with the PyPy interpreter, so that's worth a try. In any case, for most projects, the benefits of Python and its ecosystem largely surpasses the performances issues described in this section because the overhead in a microservice is rarely a problem. Summary In this article we saw that Python is considered to be one of the best languages to write web applications, and therefore microservices - for the same reasons, it's a language of choice in other areas and also because it provides tons of mature frameworks and packages to do the work. Resources for Article: Further resources on this subject: Inbuilt Data Types in Python [article] Getting Started with Python Packages [article] Layout Management for Python GUI [article]
Read more
  • 0
  • 0
  • 2985
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime
article-image-what-are-microservices
Packt
20 Jun 2017
12 min read
Save for later

What are Microservices?

Packt
20 Jun 2017
12 min read
In this article written by Gaurav Kumar Aroraa, Lalit Kale, Kanwar Manish, authors of the book Building Microservices with .NET Core, we will start with a brief introduction. Then, we will define its predecessors: monolithic architecture and service-oriented architecture (SOA). After this, we will see how microservices fare against both SOA and the monolithic architecture. We will then compare the advantages and disadvantages of each one of these architectural styles. This will enable us to identify the right scenario for these styles. We will understand the problems that arise from having a layered monolithic architecture. We will discuss the solutions available to these problems in the monolithic world. At the end, we will be able to break down a monolithic application into a microservice architecture. We will cover the following topics in this article: Origin of microservices Discussing microservices (For more resources related to this topic, see here.) Origin of microservices The term microservices was used for the first time in mid-2011 at a workshop of software architects. In March 2012, James Lewis presented some of his ideas about microservices. By the end of 2013, various groups from the IT industry started having discussions on microservices, and by 2014, it had become popular enough to be considered a serious contender for large enterprises. There is no official introduction available for microservices. The understanding of the term is purely based on the use cases and discussions held in the past. We will discuss this in detail, but before that, let's check out the definition of microservices as per Wikipedia (https://en.wikipedia.org/wiki/Microservices), which sums it up as: Microservices is a specialization of and implementation approach for SOA used to build flexible, independently deployable software systems. In 2014, James Lewis and Martin Fowler came together and provided a few real-world examples and presented microservices (refer to http://martinfowler.com/microservices/) in their own words and further detailed it as follows: The microservice architectural style is an approach to developing a single application as a suite of small services, each running in its own process and communicating with lightweight mechanisms, often an HTTP resource API. These services are built around business capabilities and independently deployable by fully automated deployment machinery. There is a bare minimum of centralized management of these services, which may be written in different programming languages and use different data storage technologies. It is very important that you see all the attributes James and Martin defined here. They defined it as an architectural style that developers could utilize to develop a single application with the business logic spread across a bunch of small services, each having their own persistent storage functionality. Also, note its attributes: it can be independently deployable, can run in its own process, is a lightweight communication mechanism, and can be written in different programming languages. We want to emphasize this specific definition since it is the crux of the whole concept. And as we move along, it will come together by the time we finish this book. Discussing microservices Until now, we have gone through a few definitions of microservices; now, let's discuss microservices in detail. In short, a microservice architecture removes most of the drawbacks of SOA architectures.  Slicing your application into a number of services is neither SOA nor microservices. However, combining service design and best practices from the SOA world along with a few emerging practices, such as isolated deployment, semantic versioning, providing lightweight services, and service discovery in polyglot programming, is microservices. We implement microservices to satisfy business features and implement them with reduced time to market and greater flexibility. Before we move on to understand the architecture, let's discuss the two important architectures that have led to its existence: The monolithic architecture style SOA Most of us would be aware of the scenario where during the life cycle of an enterprise application development, a suitable architectural style is decided. Then, at various stages, the initial pattern is further improved and adapted with changes that cater to various challenges, such as deployment complexity, large code base, and scalability issues. This is exactly how the monolithic architecture style evolved into SOA, further leading up to microservices. Monolithic architecture The monolithic architectural style is a traditional architecture type and has been widely used in the industry. The term "monolithic" is not new and is borrowed from the Unix world. In Unix, most of the commands exist as a standalone program whose functionality is not dependent on any other program. As seen in the succeeding image, we can have different components in the application such as: User interface: This handles all of the user interaction while responding with HTML or JSON or any other preferred data interchange format (in the case of web services). Business logic: All the business rules applied to the input being received in the form of user input, events, and database exist here. Database access: This houses the complete functionality for accessing the database for the purpose of querying and persisting objects. A widely accepted rule is that it is utilized through business modules and never directly through user-facing components. Software built using this architecture is self-contained. We can imagine a single .NET assembly that contains various components, as described in the following image: As the software is self-contained here, its components are interconnected and interdependent. Even a simple code change in one of the modules may break a major functionality in other modules. This would result in a scenario where we'd need to test the whole application. With the business depending critically on its enterprise application frameworks, this amount of time could prove to be very critical. Having all the components tightly coupled poses another challenge: whenever we execute or compile such software, all the components should be available or the build will fail; refer to the preceding image that represents a monolithic architecture and is a self-contained or a single .NET assembly project. However, monolithic architectures might also have multiple assemblies. This means that even though a business layer (assembly, data access layer assembly, and so on) is separated, at run time, all of them will come together and run as one process.  A user interface depends on other components' direct sale and inventory in a manner similar to all other components that depend upon each other. In this scenario, we will not be able to execute this project in the absence of any one of these components. The process of upgrading any one of these components will be more complex as we may have to consider other components that require code changes too. This results in more development time than required for the actual change. Deploying such an application will become another challenge. During deployment, we will have to make sure that each and every component is deployed properly; otherwise, we may end up facing a lot of issues in our production environments. If we develop an application using the monolithic architecture style, as discussed previously, we might face the following challenges: Large code base: This is a scenario where the code lines outnumber the comments by a great margin. As components are interconnected, we will have to bear with a repetitive code base. Too many business modules: This is in regard to modules within the same system. Code base complexity: This results in a higher chance of code breaking due to the fix required in other modules or services. Complex code deployment: You may come across minor changes that would require whole system deployment. One module failure affecting the whole system: This is in regard to modules that depend on each other. Scalability: This is required for the entire system and not just the modules in it. Intermodule dependency: This is due to tight coupling. Spiraling development time: This is due to code complexity and interdependency. Inability to easily adapt to a new technology: In this case, the entire system would need to be upgraded. As discussed earlier, if we want to reduce development time, ease of deployment, and improve maintainability of software for enterprise applications, we should avoid the traditional or monolithic architecture. Service-oriented architecture In the previous section, we discussed the monolithic architecture and its limitations. We also discussed why it does not fit into our enterprise application requirements. To overcome these issues, we should go with some modular approach where we can separate the components such that they should come out of the self-contained or single .NET assembly. The main difference between SOA & monolithic is not one or multiple assembly. But as the service in SOA runs as separate process, SOA scales better compared to monolithic. Let's discuss the modular architecture, that is, SOA. This is a famous architectural style using which the enterprise applications are designed with a collection of services as its base. These services may be RESTful or ASMX Web services. To understand SOA in more detail, let's discuss "service" first. What is service? Service, in this case, is an essential concept of SOA. It can be a piece of code, program, or software that provides some functionality to other system components. This piece of code can interact directly with the database or indirectly through another service. Furthermore, it can be consumed by clients directly, where the client may either be a website, desktop app, mobile app, or any other device app. Refer to the following diagram: Service refers to a type of functionality exposed for consumption by other systems (generally referred to as clients/client applications). As mentioned earlier, it can be represented by a piece of code, program, or software. Such services are exposed over the HTTP transport protocol as a general practice. However, the HTTP protocol is not a limiting factor, and a protocol can be picked as deemed fit for the scenario. In the following image, Service – direct selling is directly interacting with Database, and three different clients, namely Web, Desktop, and Mobile, are consuming the service. On the other hand, we have clients consuming Service – partner selling, which is interacting with Service – channel partners for database access. A product selling service is a set of services that interacts with client applications and provides database access directly or through another service, in this case, Service – Channel partner.  In the case of Service – direct selling, shown in the preceding example, it is providing some functionality to a Web Store, a desktop application, and a mobile application. This service is further interacting with the database for various tasks, namely fetching data, persisting data, and so on. Normally, services interact with other systems via some communication channel, generally the HTTP protocol. These services may or may not be deployed on the same or single servers. In the preceding image, we have projected an SOA example scenario. There are many fine points to note here, so let's get started. Firstly, our services can be spread across different physical machines. Here, Service-direct selling is hosted on two separate machines. It is a possible scenario that instead of the entire business functionality, only a part of it will reside on Server 1 and the remaining on Server 2. Similarly, Service – partner selling appears to be having the same arrangement on Server 3 and Server 4. However, it doesn't stop Service – channel partners being hosted as a complete set on both the servers: Server 5 and Server 6. A system that uses a service or multiple services in a fashion mentioned in the preceding figure is called an SOA. We will discuss SOA in detail in the following sections. Let's recall the monolithic architecture. In this case, we did not use it because it restricts code reusability; it is a self-contained assembly, and all the components are interconnected and interdependent. For deployment, in this case, we will have to deploy our complete project after we select the SOA (refer to preceding image and subsequent discussion). Now, because of the use of this architectural style, we have the benefit of code reusability and easy deployment. Let's examine this in the wake of the preceding figure: Reusability: Multiple clients can consume the service. The service can also be simultaneously consumed by other services. For example, OrderService is consumed by web and mobile clients. Now, OrderService can also be used by the Reporting Dashboard UI. Stateless: Services do not persist any state between requests from the client, that is, the service doesn't know, nor care, that the subsequent request has come from the client that has/hasn't made the previous request. Contract-based: Interfaces make it technology-agnostic on both sides of implementation and consumption. It also serves to make it immune to the code updates in the underlying functionality. Scalability: A system can be scaled up; SOA can be individually clustered with appropriate load balancing. Upgradation: It is very easy to roll out new functionalities or introduce new versions of the existing functionality. The system doesn't stop you from keeping multiple versions of the same business functionality. Summary In this article, we discussed what the microservice architectural style is in detail, its history, and how it differs from its predecessors: monolithic and SOA. We further defined the various challenges that monolithic faces when dealing with large systems. Scalability and reusability are some definite advantages that SOA provides over monolithic. We also discussed the limitations of the monolithic architecture, including scaling problems, by implementing a real-life monolithic application. The microservice architecture style resolves all these issues by reducing code interdependency and isolating the dataset size that any one of the microservices works upon. We utilized dependency injection and database refactoring for this. We further explored automation, CI, and deployment. These easily allow the development team to let the business sponsor choose what industry trends to respond to first. This results in cost benefits, better business response, timely technology adoption, effective scaling, and removal of human dependency. Resources for Article: Further resources on this subject: Microservices and Service Oriented Architecture [article] Breaking into Microservices Architecture [article] Microservices – Brave New World [article]
Read more
  • 0
  • 0
  • 3392

article-image-hands-service-fabric
Packt
06 Apr 2017
12 min read
Save for later

Hands on with Service Fabric

Packt
06 Apr 2017
12 min read
In this article by Rahul Rai and Namit Tanasseri, authors of the book Microservices with Azure, explains that Service Fabric as a platform supports multiple programming models. Each of which is best suited for specific scenarios. Each programming model offers different levels of integration with the underlying management framework. Better integration leads to more automation and lesser overheads. Picking the right programming model for your application or services is the key to efficiently utilize the capabilities of Service Fabric as a hosting platform. Let's take a deeper look into these programming models. (For more resources related to this topic, see here.) To start with, let's look at the least integrated hosting option: Guest Executables. Native windows applications or application code using Node.js or Java can be hosted on Service Fabric as a guest executable. These executables can be packaged and pushed to a Service Fabric cluster like any other services. As the cluster manager has minimal knowledge about the executable, features like custom health monitoring, load reporting, state store and endpoint registration cannot be leveraged by the hosted application. However, from a deployment standpoint, a guest executable is treated like any other service. This means that for a guest executable, Service Fabric cluster manager takes care of high availability, application lifecycle management, rolling updates, automatic failover, high density deployment and load balancing. As an orchestration service, Service Fabric is responsible for deploying and activating an application or application services within a cluster. It is also capable of deploying services within a container image. This programming model is addressed as Guest Containers. The concept of containers is best explained as an implementation of operating system level virtualization. They are encapsulated deployable components running on isolated process boundaries sharing the same kernel. Deployed applications and their runtime dependencies are bundles within the container with an isolated view of all operating system constructs. This makes containers highly portable and secure. Guest container programming model is usually chosen when this level of isolation is required for the application. As containers don't have to boot an operating system, they have fast boot up time and are comparatively small in size. A prime benefit of using Service Fabric as a platform is the fact that it supports heterogeneous operating environments. Service Fabric supports two types of containers to be deployed as guest containers: Docker containers on Linux and Windows server containers. Container images for Docker containers are stored in Docker Hub and Docker APIs are used to create and manage the containers deployed on Linux kernel. Service Fabric supports two different types of containers in Windows Server 2016 with different levels of isolation. They are: Windows Server containers and Windows Hyper-V containers Windows Server containers are similar to Docker containers in terms of the isolation they provide. Windows Hyper-V containers offer higher degree of isolation and security by not sharing the operating system kernel across instances. These are ideally used when a higher level of security isolation is required such as systems requiring hostile multitenant hosts. The following figure illustrates the different isolation levels achieved by using these containers. Container isolation levels Service Fabric application model treats containers as an application host which can in turn host service replicas. There are three ways of utilizing containers within a Service Fabric application mode. Existing applications like Node.js, JavaScript application of other executables can be hosted within a container and deployed on Service Fabric as a Guest Container. A Guest Container is treated similar to a Guest Executable by Service Fabric runtime. The second scenario supports deploying stateless services inside a container hosted on Service Fabric. Stateless services using Reliable Services and Reliable actors can be deployed within a container. The third option is to deploy stateful services in containers hosted on Service Fabric. This model also supports Reliable Services and Reliable Actors. Service Fabric offers several features to manage containerized Microservices. These include container deployment and activation, resource governance, repository authentication, port mapping, container discovery and communication and ability to set environment variables. While containers offer a good level of isolation it is still heavy in terms of deployment footprint. Service Fabric offers a simpler, powerful programming model to develop your services which they call Reliable Services. Reliable services let you develop stateful and stateless services which can be directly deployed on Service Fabric clusters. For stateful services, the state can be stored close to the compute by using Reliable Collections. High availability of the state store and replication of the state is taken care by the Service Fabric cluster management services. This contributes substantially to the performance of the system by improving the latency of data access. Reliable services come with a built-in pluggable communication model which supports HTTP with Web API, WebSockets and custom TCP protocols out of the box. A Reliable service is addressed as stateless if it does not maintain any state within it or if the scope of the state stored is limited to a service call and is entirely disposable. This means that a stateless service does not require to persist, synchronize or replicate state. A good example for this service is a weather service like MSN weather service. A weather service can be queried to retrieve weather conditions associated with a specific geographical location. The response is totally based on the parameters supplied to the service. This service does not store any state. Although stateless services are simpler to implement, most of the services in real life are not stateless. They either store state in an external state store or an internal one. Web front end hosting APIs or web applications are good use cases to be hosted as stateless services. A stateful service persists states. The outcome of a service call made to a stateful service is usually influenced by the state persisted by the service. A service exposed by a bank to return the balance on an account is a good example for a stateful service. The state may be stored in an external data store such as Azure SQL Database, Azure Blobs or Azure Table store. Most services prefer to store the state externally considering the challenges around reliability, availability, scalability and consistency of the data store. With Service Fabric, state can be stored close to the compute by using reliable collections. To makes things more lightweight, Service Fabric also offers a programming model based on Virtual actor pattern. This programming model is called Reliable Actors. The Reliable Actors programming model is built on top of Reliable Services. This guarantees the scalability and reliability of the services. An Actor can be defined as an isolated, independent unit of compute and state with single-threaded execution. Actors can be created, managed and disposed independent of each other. Large number of actors can coexist and execute at a time. Service Fabric Reliable Actors are a good fit for systems which are highly distributed and dynamic by nature. Every actor is defined as an instance of an actor type; the same way an object is an instance of a class. Each actor is uniquely identified by an actor ID. The lifetime of Service Fabric Actors is not tied to their in-memory state. As a result, Actors are automatically created the first time a request for them is made. Reliable Actor's garbage collector takes care of disposing unused Actors in memory. Now that we understand the programming models, let's take a look at how the services deployed on Service Fabric are discovered and how the communication between services takes place. Service Fabric discovery and communication An application built on top of Microservices is usually composed of multiple services, each of which runs multiple replicas. Each service is specialized in a specific task. To achieve an end to end business use case, multiple services will need to be stitched together. This requires services to communicate to each other. A simple example would be web front end service communicating with the middle tier services which in turn connects to the back end services to handle a single user request. Some of these middle tier services can also be invoked by external applications. Services deployed on Service Fabric are distributed across multiple nodes in a cluster of virtual machines. The services can move across dynamically. This distribution of services can wither be triggered by a manual action of be result of Service Fabric cluster manager re-balancing services to achieve optimal resource utilization. This makes communication a challenge as services are not tied to a particular machine. Let's understand how Service Fabric solved this challenge for its consumers. Service protocols Service Fabric, as a hosting platform for Microservices does not interfere in the implementation of the service. On top of this, it also lets services decide on the communication channels they want to open. These channels are addressed as service endpoints. During service initiation, Service Fabric provides the opportunity for the services to set up the endpoints for incoming request on any protocol or communication stack. The endpoints are defined according to common industry standards, that is IP:Port. It is possible that multiple service instances share a single host process. In which case, they either have to use different ports or a port sharing mechanism. This will ensure that every service instance is uniquely addressable. Service endpoints Service discovery Service Fabric can rebalance services deployed on a cluster as a part of orchestration activities. This can be caused by resource balancing activities, failovers, upgrades, scale outs or scale ins. This will result in change in service endpoint addresses as the services move across different virtual machines. Service distribution The Service Fabric Naming Service is responsible for abstracting this complexity from the consuming service or application. Naming service takes care of service discovery and resolution. All service instances in Services Fabric are identified by a unique URL like fabric:/MyMicroServiceApp/AppService1. This name stays constant across the lifetime of the service although the endpoint addresses which physically host the service may change. Internally, Service Fabric manages a map between the service names and the physical location where the service is hosted. This is similar to the DNS service which is used to resolve Website URLs to IP addresses. The following figure illustrates the name resolution process for a service hosted on Service Fabric: Name resolution Connections from applications external to Service Fabric Service communications to or between services hosted in Service Fabric can be categorized as internal or external. Internal communication among services hosted on Service Fabric is easily achieved using the Naming Service. External communication, originated from an application or a user outside the boundaries of Service Fabric will need some extra work. To understand how this works, let's dive deeper in to the logical network layout of a typical Service Fabric cluster. Service Fabric cluster is always placed behind an Azure Load Balancer. The Load Balancer acts like a gateway to all traffic which needs to pass to the Service Fabric cluster. The Load Balancer is aware of every post open on every node of a cluster. When a request hits the Load Balancer, it identifies the port the request is looking for and randomly routes the request to one of the nodes which has the requested port open. The Load Balancer is not aware of the services running on the nodes or the ports associated with the services. The following figure illustrates request routing in action. Request routing Configuring ports and protocols The protocol and the ports to be opened by a Service Fabric cluster can be easily configured through the portal. Let's take an example to understand the configuration in detail. If we need a web application to be hosted on a Service Fabric cluster which should have port 80 opened on HTTP to accept incoming traffic, the following steps should be performed. Configuring service manifest Once a service listening to port 80 is authored, we need to configure port 80 in the service manifest to open a listener in the service. This can be done by editing the Service Manifest.xml. <Resources> <Endpoints> <Endpoint Name="WebEndpoint" Protocol="http" Port="80" /> </Endpoints> </Resources> Configuring custom end point On the Service Fabric cluster, configure port 80 as a custom endpoint. This can be easily done through the Azure Management portal. Configuring custom port Configure Azure Load Balancer Once the cluster is configured and created, the Azure Load Balancer can be instructed to forward the traffic to port 80. If the Service Fabric cluster is created through the portal, this step is automatically taken care for every port which is configured on the cluster configuration. Configuring Azure Load Balancer Configure health check Azure Load Balancer probes the ports on the nodes for their availability to ensure reliability of the service. The probes can be configured on the Azure portal. This is an optional step as a default probe configuration is applied for each endpoint when a cluster is created. Configuring probe Built-in Communication API Service Fabric offers many built-in communication options to support inter service communications. Service Remoting is one of them. This option allows strong typed remote procedure calls between Reliable Services and Reliable Actors. This option is very easy to set up and operate with as Service Remoting handles resolution of service addresses, connection, retry and error handling. Service Fabric also supports HTTP for language-agnostic communication. Service Fabric SDK exposes ICommunicationClient and ServicePartitionClient classes for service resolution, HTTP connections, and retry loops. WCF is also supported by Service Fabric as a communication channel to enable legacy workload to be hosted on it. The SDK exposed WcfCommunicationListener for the server side and WcfCommunicationClient and ServicePartitionClient classes for the client to ease programming hurdles. Resources for Article: Further resources on this subject: Installing Neutron [article] Designing and Building a vRealize Automation 6.2 Infrastructure [article] Insight into Hyper-V Storage [article]
Read more
  • 0
  • 0
  • 4056

article-image-microservices-and-service-oriented-architecture
Packt
09 Mar 2017
6 min read
Save for later

Microservices and Service Oriented Architecture

Packt
09 Mar 2017
6 min read
Microservices are an architecture style and an approach for software development to satisfy modern business demands. They are not a new invention as such. They are instead an evolution of previous architecture styles. Many organizations today use them - they can improve organizational agility, speed of delivery, and ability to scale. Microservices give you a way to develop more physically separated modular applications. This tutorial has been taken from Spring 5.0 Microsevices - Second Edition Microservices are similar to conventional service-oriented architectures. In this article, we will see how microservices are related to SOA. The emergence of microservices Many organizations, such as Netflix, Amazon, and eBay, successfully used what is known as the 'divide and conquer' technique to functionally partition their monolithic applications into smaller atomic units. Each one performs a single function - a 'service'. These organizations solved a number of prevailing issues they were experiencing with their monolithic application. Following the success of these organizations, many other organizations started adopting this as a common pattern to refactor their monolithic applications. Later, evangelists termed this pattern as microservices architecture. Microservices originated from the idea of Hexagonal Architecture, coined by Alistair Cockburn back in 2005. Hexagonal Architecture or Hexagonal pattern is also known as the Ports and Adapters pattern. Cockburn defined microservices as: "...an architectural style or an approach for building IT systems as a set of business capabilities that are autonomous, self contained, and loosely coupled." The following diagram depicts a traditional N-tier application architecture having presentation layer, business layer, and database layer: Modules A, B, and C represent three different business capabilities. The layers in the diagram represent separation of architecture concerns. Each layer holds all three business capabilities pertaining to that layer. Presentation layer has web components of all three modules, business layer has business components of all three modules, and database hosts tables of all three modules. In most cases, layers are physically spreadable, whereas modules within a layer are hardwired. Let's now examine a microservice-based architecture: As we can see in the preceding diagram, the boundaries are inversed in the microservices architecture. Each vertical slice represents a microservice. Each microservice will have its own presentation layer, business layer, and database layer. Microservices is aligned toward business capabilities. By doing so, changes to one microservice do not impact the others. There is no standard for communication or transport mechanisms for microservices. In general, microservices communicate with each other using widely adopted lightweight protocols, such as HTTP and REST, or messaging protocols, such as JMS or AMQP. In specific cases, one might choose more optimized communication protocols, such as Thrift, ZeroMQ, Protocol Buffers, or Avro. As microservices is more aligned to the business capabilities and has independently manageable lifecycles, they are the ideal choice for enterprises embarking on DevOps and cloud. DevOps and cloud are two facets of microservices. How do microservices compare to Service Oriented Architectures? One of the common question arises when dealing with microservices architecture is, how is it different from SOA. SOA and microservices follow similar concepts. Earlier in this article, we saw that microservices is evolved from SOA and many service characteristics that are common in both approaches. However, are they the same or different? As microservices evolved from SOA, many characteristics of microservices is similar to SOA. Let’s first examine the definition of SOA. The Open Group definition of SOA is as follows: "SOA is an architectural style that supports service-orientation. Service-orientation is a way of thinking in terms of services and service-based development and the outcomes of services. Is self-contained May be composed of other services Is a “black box” to consumers of the service" You have learned similar aspects in microservices as well. So, in what way is microservices different? The answer is--it depends. The answer to the previous question could be yes or no, depending upon the organization and its adoption of SOA. SOA is a broader term and different organizations approached SOA differently to solve different organizational problems. The difference between microservices and SOA is in the way based on how an organization approaches SOA. In order to get clarity, a few cases will be examined here. Service oriented integration Service-oriented integration refers to a service-based integration approach used by many organizations: Many organizations would have used SOA primarily to solve their integration complexities, also known as integration spaghetti. Generally, this is termed as Service Oriented Integration (SOI). In such cases, applications communicate with each other through a common integration layer using standard protocols and message formats, such as SOAP/XML-based web services over HTTP or Java Message Service (JMS). These types of organizations focus on Enterprise Integration Patterns (EIP) to model their integration requirements. This approach strongly relies on heavyweight Enterprise Service Bus (ESB),such as TIBCO Business Works, WebSphere ESB, Oracle ESB, and the likes. Most of the ESB vendors also packed a set of related product, such as Rules Engines, Business Process Management Engines, and so on as a SOA suite. Such organization's integrations are deeply rooted into these products. They either write heavy orchestration logic in the ESB layer or business logic itself in the service bus. In both cases, all enterprise services are deployed and accessed through the ESB. These services are managed through an enterprise governance model. For such organizations, microservices is altogether different from SOA. Legacy modernization SOA is also used to build service layers on top of legacy applications which is shown in the following diagram: Another category of organizations would have used SOA in transformation projects or legacy modernization projects. In such cases, the services are built and deployed in the ESB connecting to backend systems using ESB adapters. For these organizations, microservices are different from SOA. Service oriented application Some organizations would have adopted SOA at an application level: In this approach as shown in the preceding diagram, lightweight Integration frameworks, such as Apache Camel or Spring Integration, are embedded within applications to handle service related cross-cutting capabilities, such as protocol mediation, parallel execution, orchestration, and service integration. As some of the lightweight integration frameworks had native Java object support, such applications would have even used native Plain Old Java Objects (POJO) services for integration and data exchange between services. As a result, all services have to be packaged as one monolithic web archive. Such organizations could see microservices as the next logical step of their SOA. Monolithic migration using SOA The following diagram represents Logical System Boundaries: The last possibility is transforming a monolithic application into smaller units after hitting the breaking point with the monolithic system. They would have broken the application into smaller physically deployable subsystems, similar to the Y axis scaling approach explained earlier and deployed them as web archives on web servers or as jars deployed on some home grown containers. These subsystems as service would have used web services or other lightweight protocols to exchange data between services. They would have also used SOA and service design principles to achieve this. For such organizations, they may tend to think that microservices is the same old wine in a new bottle. Further resources on this subject: Building Scalable Microservices [article] Breaking into Microservices Architecture [article] A capability model for microservices [article]
Read more
  • 0
  • 0
  • 6292

article-image-building-scalable-microservices
Packt
18 Jan 2017
33 min read
Save for later

Building Scalable Microservices

Packt
18 Jan 2017
33 min read
In this article by Vikram Murugesan, the author of the book Microservices Deployment Cookbook, we will see a brief introduction to concept of the microservices. (For more resources related to this topic, see here.) Writing microservices with Spring Boot Now that our project is ready, let's look at how to write our microservice. There are several Java-based frameworks that let you create microservices. One of the most popular frameworks from the Spring ecosystem is the Spring Boot framework. In this article, we will look at how to create a simple microservice application using Spring Boot. Getting ready Any application requires an entry point to start the application. For Java-based applications, you can write a class that has the main method and run that class as a Java application. Similarly, Spring Boot requires a simple Java class with the main method to run it as a Spring Boot application (microservice). Before you start writing your Spring Boot microservice, you will also require some Maven dependencies in your pom.xml file. How to do it… Create a Java class called com.packt.microservices.geolocation.GeoLocationApplication.java and give it an empty main method: package com.packt.microservices.geolocation; public class GeoLocationApplication { public static void main(String[] args) { // left empty intentionally } } Now that we have our basic template project, let's make our project a child project of Spring Boot's spring-boot-starter-parent pom module. This module has a lot of prerequisite configurations in its pom.xml file, thereby reducing the amount of boilerplate code in our pom.xml file. At the time of writing this, 1.3.6.RELEASE was the most recent version: <parent> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-parent</artifactId> <version>1.3.6.RELEASE</version> </parent> After this step, you might want to run a Maven update on your project as you have added a new parent module. If you see any warnings about the version of the maven-compiler plugin, you can either ignore it or just remove the <version>3.5.1</version> element. If you remove the version element, please perform a Maven update afterward. Spring Boot has the ability to enable or disable Spring modules such as Spring MVC, Spring Data, and Spring Caching. In our use case, we will be creating some REST APIs to consume the geolocation information of the users. So we will need Spring MVC. Add the following dependencies to your pom.xml file: <dependencies> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-web</artifactId> </dependency> </dependencies> We also need to expose the APIs using web servers such as Tomcat, Jetty, or Undertow. Spring Boot has an in-memory Tomcat server that starts up as soon as you start your Spring Boot application. So we already have an in-memory Tomcat server that we could utilize. Now let's modify the GeoLocationApplication.java class to make it a Spring Boot application: package com.packt.microservices.geolocation; import org.springframework.boot.SpringApplication; import org.springframework.boot.autoconfigure.SpringBootApplication; @SpringBootApplication public class GeoLocationApplication { public static void main(String[] args) { SpringApplication.run(GeoLocationApplication.class, args); } } As you can see, we have added an annotation, @SpringBootApplication, to our class. The @SpringBootApplication annotation reduces the number of lines of code written by adding the following three annotations implicitly: @Configuration @ComponentScan @EnableAutoConfiguration If you are familiar with Spring, you will already know what the first two annotations do. @EnableAutoConfiguration is the only annotation that is part of Spring Boot. The AutoConfiguration package has an intelligent mechanism that guesses the configuration of your application and automatically configures the beans that you will likely need in your code. You can also see that we have added one more line to the main method, which actually tells Spring Boot the class that will be used to start this application. In our case, it is GeoLocationApplication.class. If you would like to add more initialization logic to your application, such as setting up the database or setting up your cache, feel free to add it here. Now that our Spring Boot application is all set to run, let's see how to run our microservice. Right-click on GeoLocationApplication.java from Package Explorer, select Run As, and then select Spring Boot App. You can also choose Java Application instead of Spring Boot App. Both the options ultimately do the same thing. You should see something like this on your STS console: If you look closely at the console logs, you will notice that Tomcat is being started on port number 8080. In order to make sure our Tomcat server is listening, let's run a simple curl command. cURL is a command-line utility available on most Unix and Mac systems. For Windows, use tools such as Cygwin or even Postman. Postman is a Google Chrome extension that gives you the ability to send and receive HTTP requests. For simplicity, we will use cURL. Execute the following command on your terminal: curl http://localhost:8080 This should give us an output like this: {"timestamp":1467420963000,"status":404,"error":"Not Found","message":"No message available","path":"/"} This error message is being produced by Spring. This verifies that our Spring Boot microservice is ready to start building on with more features. There are more configurations that are needed for Spring Boot, which we will perform later in this article along with Spring MVC. Writing microservices with WildFly Swarm WildFly Swarm is a J2EE application packaging framework from RedHat that utilizes the in-memory Undertow server to deploy microservices. In this article, we will create the same GeoLocation API using WildFly Swarm and JAX-RS. To avoid confusion and dependency conflicts in our project, we will create the WildFly Swarm microservice as its own Maven project. This article is just here to help you get started on WildFly Swarm. When you are building your production-level application, it is your choice to either use Spring Boot, WildFly Swarm, Dropwizard, or SparkJava based on your needs. Getting ready Similar to how we created the Spring Boot Maven project, create a Maven WAR module with the groupId com.packt.microservices and name/artifactId geolocation-wildfly. Feel free to use either your IDE or the command line. Be aware that some IDEs complain about a missing web.xml file. We will see how to fix that in the next section. How to do it… Before we set up the WildFly Swarm project, we have to fix the missing web.xml error. The error message says that Maven expects to see a web.xml file in your project as it is a WAR module, but this file is missing in your project. In order to fix this, we have to add and configure maven-war-plugin. Add the following code snippet to your pom.xml file's project section: <build> <plugins> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-war-plugin</artifactId> <version>2.6</version> <configuration> <failOnMissingWebXml>false</failOnMissingWebXml> </configuration> </plugin> </plugins> </build> After adding the snippet, save your pom.xml file and perform a Maven update. Also, if you see that your project is using a Java version other than 1.8. Again, perform a Maven update for the changes to take effect. Now, let's add the dependencies required for this project. As we know that we will be exposing our APIs, we have to add the JAX-RS library. JAX-RS is the standard JSR-compliant API for creating RESTful web services. JBoss has its own version of JAX-RS. So let's  add that dependency to the pom.xml file: <dependencies> <dependency> <groupId>org.jboss.spec.javax.ws.rs</groupId> <artifactId>jboss-jaxrs-api_2.0_spec</artifactId> <version>1.0.0.Final</version> <scope>provided</scope> </dependency> </dependencies> The one thing that you have to note here is the provided scope. The provide scope in general means that this JAR need not be bundled with the final artifact when it is built. Usually, the dependencies with provided scope will be available to your application either via your web server or application server. In this case, when Wildfly Swarm bundles your app and runs it on the in-memory Undertow server, your server will already have this dependency. The next step toward creating the GeoLocation API using Wildfly Swarm is creating the domain object. Use the com.packt.microservices.geolocation.GeoLocation.java file. Now that we have the domain object, there are two classes that you need to create in order to write your first JAX-RS web service. The first of those is the Application class. The Application class in JAX-RS is used to define the various components that you will be using in your application. It can also hold some metadata about your application, such as your basePath (or ApplicationPath) to all resources listed in this Application class. In this case, we are going to use /geolocation as our basePath. Let's see how that looks: package com.packt.microservices.geolocation; import javax.ws.rs.ApplicationPath; import javax.ws.rs.core.Application; @ApplicationPath("/geolocation") public class GeoLocationApplication extends Application { public GeoLocationApplication() {} } There are two things to note in this class; one is the Application class and the other is the @ApplicationPath annotation—both of which we've already talked about. Now let's move on to the resource class, which is responsible for exposing the APIs. If you are familiar with Spring MVC, you can compare Resource classes to Controllers. They are responsible for defining the API for any specific resource. The annotations are slightly different from that of Spring MVC. Let's create a new resource class called com.packt.microservices.geolocation.GeoLocationResource.java that exposes a simple GET API: package com.packt.microservices.geolocation; import java.util.ArrayList; import java.util.List; import javax.ws.rs.GET; import javax.ws.rs.Path; import javax.ws.rs.Produces; @Path("/") public class GeoLocationResource { @GET @Produces("application/json") public List<GeoLocation> findAll() { return new ArrayList<>(); } } All the three annotations, @GET, @Path, and @Produces, are pretty self explanatory. Before we start writing the APIs and the service class, let's test the application from the command line to make sure it works as expected. With the current implementation, any GET request sent to the /geolocation URL should return an empty JSON array. So far, we have created the RESTful APIs using JAX-RS. It's just another JAX-RS project: In order to make it a microservice using Wildfly Swarm, all you have to do is add the wildfly-swarm-plugin to the Maven pom.xml file. This plugin will be tied to the package phase of the build so that whenever the package goal is triggered, the plugin will create an uber JAR with all required dependencies. An uber JAR is just a fat JAR that has all dependencies bundled inside itself. It also deploys our application in an in-memory Undertow server. Add the following snippet to the plugins section of the pom.xml file: <plugin> <groupId>org.wildfly.swarm</groupId> <artifactId>wildfly-swarm-plugin</artifactId> <version>1.0.0.Final</version> <executions> <execution> <id>package</id> <goals> <goal>package</goal> </goals> </execution> </executions> </plugin> Now execute the mvn clean package command from the project's root directory, and wait for the Maven build to be successful. If you look at the logs, you can see that wildfly-swarm-plugin will create the uber JAR, which has all its dependencies. You should see something like this in your console logs: After the build is successful, you will find two artifacts in the target directory of your project. The geolocation-wildfly-0.0.1-SNAPSHOT.war file is the final WAR created by the maven-war-plugin. The geolocation-wildfly-0.0.1-SNAPSHOT-swarm.jar file is the uber JAR created by the wildfly-swarm-plugin. Execute the following command in the same terminal to start your microservice: java –jar target/geolocation-wildfly-0.0.1-SNAPSHOT-swarm.jar After executing this command, you will see that Undertow has started on port number 8080, exposing the geolocation resource we created. You will see something like this: Execute the following cURL command in a separate terminal window to make sure our API is exposed. The response of the command should be [], indicating there are no geolocations: curl http://localhost:8080/geolocation Now let's build the service class and finish the APIs that we started. For simplicity purposes, we are going to store the geolocations in a collection in the service class itself. In a real-time scenario, you will be writing repository classes or DAOs that talk to the database that holds your geolocations. Get the com.packt.microservices.geolocation.GeoLocationService.java interface. We'll use the same interface here. Create a new class called com.packt.microservices.geolocation.GeoLocationServiceImpl.java that extends the GeoLocationService interface: package com.packt.microservices.geolocation; import java.util.ArrayList; import java.util.Collections; import java.util.List; public class GeoLocationServiceImpl implements GeoLocationService { private static List<GeoLocation> geolocations = new ArrayList<>(); @Override public GeoLocation create(GeoLocation geolocation) { geolocations.add(geolocation); return geolocation; } @Override public List<GeoLocation> findAll() { return Collections.unmodifiableList(geolocations); } } Now that our service classes are implemented, let's finish building the APIs. We already have a very basic stubbed-out GET API. Let's just introduce the service class to the resource class and call the findAll method. Similarly, let's use the service's create method for POST API calls. Add the following snippet to GeoLocationResource.java: private GeoLocationService service = new GeoLocationServiceImpl(); @GET @Produces("application/json") public List<GeoLocation> findAll() { return service.findAll(); } @POST @Produces("application/json") @Consumes("application/json") public GeoLocation create(GeoLocation geolocation) { return service.create(geolocation); } We are now ready to test our application. Go ahead and build your application. After the build is successful, run your microservice: let's try to create two geolocations using the POST API and later try to retrieve them using the GET method. Execute the following cURL commands in your terminal one by one: curl -H "Content-Type: application/json" -X POST -d '{"timestamp": 1468203975, "userId": "f1196aac-470e-11e6-beb8-9e71128cae77", "latitude": 41.803488, "longitude": -88.144040}' http://localhost:8080/geolocation This should give you something like the following output (pretty-printed for readability): { "latitude": 41.803488, "longitude": -88.14404, "userId": "f1196aac-470e-11e6-beb8-9e71128cae77", "timestamp": 1468203975 } curl -H "Content-Type: application/json" -X POST -d '{"timestamp": 1468203975, "userId": "f1196aac-470e-11e6-beb8-9e71128cae77", "latitude": 9.568012, "longitude": 77.962444}' http://localhost:8080/geolocation This command should give you an output similar to the following (pretty-printed for readability): { "latitude": 9.568012, "longitude": 77.962444, "userId": "f1196aac-470e-11e6-beb8-9e71128cae77", "timestamp": 1468203975 } To verify whether your entities were stored correctly, execute the following cURL command: curl http://localhost:8080/geolocation This should give you an output like this (pretty-printed for readability): [ { "latitude": 41.803488, "longitude": -88.14404, "userId": "f1196aac-470e-11e6-beb8-9e71128cae77", "timestamp": 1468203975 }, { "latitude": 9.568012, "longitude": 77.962444, "userId": "f1196aac-470e-11e6-beb8-9e71128cae77", "timestamp": 1468203975 } ] Whatever we have seen so far will give you a head start in building microservices with WildFly Swarm. Of course, there are tons of features that WildFly Swarm offers. Feel free to try them out based on your application needs. I strongly recommend going through the WildFly Swarm documentation for any advanced usages. Writing microservices with Dropwizard Dropwizard is a collection of libraries that help you build powerful applications quickly and easily. The libraries vary from Jackson, Jersey, Jetty, and so on. You can take a look at the full list of libraries on their website. This ecosystem of libraries that help you build powerful applications could be utilized to create microservices as well. As we saw earlier, it utilizes Jetty to expose its services. In this article, we will create the same GeoLocation API using Dropwizard and Jersey. To avoid confusion and dependency conflicts in our project, we will create the Dropwizard microservice as its own Maven project. This article is just here to help you get started with Dropwizard. When you are building your production-level application, it is your choice to either use Spring Boot, WildFly Swarm, Dropwizard, or SparkJava based on your needs. Getting ready Similar to how we created other Maven projects,  create a Maven JAR module with the groupId com.packt.microservices and name/artifactId geolocation-dropwizard. Feel free to use either your IDE or the command line. After the project is created, if you see that your project is using a Java version other than 1.8. Perform a Maven update for the change to take effect. How to do it… The first thing that you will need is the dropwizard-core Maven dependency. Add the following snippet to your project's pom.xml file: <dependencies> <dependency> <groupId>io.dropwizard</groupId> <artifactId>dropwizard-core</artifactId> <version>0.9.3</version> </dependency> </dependencies> Guess what? This is the only dependency you will need to spin up a simple Jersey-based Dropwizard microservice. Before we start configuring Dropwizard, we have to create the domain object, service class, and resource class: com.packt.microservices.geolocation.GeoLocation.java com.packt.microservices.geolocation.GeoLocationService.java com.packt.microservices.geolocation.GeoLocationImpl.java com.packt.microservices.geolocation.GeoLocationResource.java Let's see what each of these classes does. The GeoLocation.java class is our domain object that holds the geolocation information. The GeoLocationService.java class defines our interface, which is then implemented by the GeoLocationServiceImpl.java class. If you take a look at the GeoLocationServiceImpl.java class, we are using a simple collection to store the GeoLocation domain objects. In a real-time scenario, you will be persisting these objects in a database. But to keep it simple, we will not go that far. To be consistent with the previous, let's change the path of GeoLocationResource to /geolocation. To do so, replace @Path("/") with @Path("/geolocation") on line number 11 of the GeoLocationResource.java class. We have now created the service classes, domain object, and resource class. Let's configure Dropwizard. In order to make your project a microservice, you have to do two things: Create a Dropwizard configuration class. This is used to store any meta-information or resource information that your application will need during runtime, such as DB connection, Jetty server, logging, and metrics configurations. These configurations are ideally stored in a YAML file, which will them be mapped to your Configuration class using Jackson. In this application, we are not going to use the YAML configuration as it is out of scope for this article. If you would like to know more about configuring Dropwizard, refer to their Getting Started documentation page at http://www.dropwizard.io/0.7.1/docs/getting-started.html. Let's  create an empty Configuration class called GeoLocationConfiguration.java: package com.packt.microservices.geolocation; import io.dropwizard.Configuration; public class GeoLocationConfiguration extends Configuration { } The YAML configuration file has a lot to offer. Take a look at a sample YAML file from Dropwizard's Getting Started documentation page to learn more. The name of the YAML file is usually derived from the name of your microservice. The microservice name is usually identified by the return value of the overridden method public String getName() in your Application class. Now let's create the GeoLocationApplication.java application class: package com.packt.microservices.geolocation; import io.dropwizard.Application; import io.dropwizard.setup.Environment; public class GeoLocationApplication extends Application<GeoLocationConfiguration> { public static void main(String[] args) throws Exception { new GeoLocationApplication().run(args); } @Override public void run(GeoLocationConfiguration config, Environment env) throws Exception { env.jersey().register(new GeoLocationResource()); } } There are a lot of things going on here. Let's look at them one by one. Firstly, this class extends Application with the GeoLocationConfiguration generic. This clearly makes an instance of your GeoLocationConfiguraiton.java class available so that you have access to all the properties you have defined in your YAML file at the same time mapped in the Configuration class. The next one is the run method. The run method takes two arguments: your configuration and environment. The Environment instance is a wrapper to other library-specific objects such as MetricsRegistry, HealthCheckRegistry, and JerseyEnvironment. For example, we could register our Jersey resources using the JerseyEnvironment instance. The env.jersey().register(new GeoLocationResource())line does exactly that. The main method is pretty straight-forward. All it does is call the run method. Before we can start the microservice, we have to configure this project to create a runnable uber JAR. Uber JARs are just fat JARs that bundle their dependencies in themselves. For this purpose, we will be using the maven-shade-plugin. Add the following snippet to the build section of the pom.xml file. If this is your first plugin, you might want to wrap it in a <plugins> element under <build>: <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-shade-plugin</artifactId> <version>2.3</version> <configuration> <createDependencyReducedPom>true</createDependencyReducedPom> <filters> <filter> <artifact>*:*</artifact> <excludes> <exclude>META-INF/*.SF</exclude> <exclude>META-INF/*.DSA</exclude> <exclude>META-INF/*.RSA</exclude> </excludes> </filter> </filters> </configuration> <executions> <execution> <phase>package</phase> <goals> <goal>shade</goal> </goals> <configuration> <transformers> <transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer" /> <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer"> <mainClass>com.packt.microservices.geolocation.GeoLocationApplication</mainClass> </transformer> </transformers> </configuration> </execution> </executions> </plugin> The previous snippet does the following: It creates a runnable uber JAR that has a reduced pom.xml file that does not include the dependencies that are added to the uber JAR. To learn more about this property, take a look at the documentation of maven-shade-plugin. It utilizes com.packt.microservices.geolocation.GeoLocationApplication as the class whose main method will be invoked when this JAR is executed. This is done by updating the MANIFEST file. It excludes all signatures from signed JARs. This is required to avoid security errors. Now that our project is properly configured, let's try to build and run it from the command line. To build the project, execute mvn clean package from the project's root directory in your terminal. This will create your final JAR in the target directory. Execute the following command to start your microservice: java -jar target/geolocation-dropwizard-0.0.1-SNAPSHOT.jar server The server argument instructs Dropwizard to start the Jetty server. After you issue the command, you should be able to see that Dropwizard has started the in-memory Jetty server on port 8080. If you see any warnings about health checks, ignore them. Your console logs should look something like this: We are now ready to test our application. Let's try to create two geolocations using the POST API and later try to retrieve them using the GET method. Execute the following cURL commands in your terminal one by one: curl -H "Content-Type: application/json" -X POST -d '{"timestamp": 1468203975, "userId": "f1196aac-470e-11e6-beb8-9e71128cae77", "latitude": 41.803488, "longitude": -88.144040}' http://localhost:8080/geolocation This should give you an output similar to the following (pretty-printed for readability): { "latitude": 41.803488, "longitude": -88.14404, "userId": "f1196aac-470e-11e6-beb8-9e71128cae77", "timestamp": 1468203975 } curl -H "Content-Type: application/json" -X POST -d '{"timestamp": 1468203975, "userId": "f1196aac-470e-11e6-beb8-9e71128cae77", "latitude": 9.568012, "longitude": 77.962444}' http://localhost:8080/geolocation This should give you an output like this (pretty-printed for readability): { "latitude": 9.568012, "longitude": 77.962444, "userId": "f1196aac-470e-11e6-beb8-9e71128cae77", "timestamp": 1468203975 } To verify whether your entities were stored correctly, execute the following cURL command: curl http://localhost:8080/geolocation It should give you an output similar to the following (pretty-printed for readability): [ { "latitude": 41.803488, "longitude": -88.14404, "userId": "f1196aac-470e-11e6-beb8-9e71128cae77", "timestamp": 1468203975 }, { "latitude": 9.568012, "longitude": 77.962444, "userId": "f1196aac-470e-11e6-beb8-9e71128cae77", "timestamp": 1468203975 } ] Excellent! You have created your first microservice with Dropwizard. Dropwizard offers more than what we have seen so far. Some of it is out of scope for this article. I believe the metrics API that Dropwizard uses could be used in any type of application. Writing your Dockerfile So far in this article, we have seen how to package our application and how to install Docker. Now that we have our JAR artifact and Docker set up, let's see how to Dockerize our microservice application using Docker. Getting ready In order to Dockerize our application, we will have to tell Docker how our image is going to look. This is exactly the purpose of a Dockerfile. A Dockerfile has its own syntax (or Dockerfile instructions) and will be used by Docker to create images. Throughout this article, we will try to understand some of the most commonly used Dockerfile instructions as we write our Dockerfile for the geolocation tracker microservice. How to do it… First, open your STS IDE and create a new file called Dockerfile in the geolocation project. The first line of the Dockerfile is always the FROM instruction followed by the base image that you would like to create your image from. There are thousands of images on Docker Hub to choose from. In our case, we would need something that already has Java installed on it. There are some images that are official, meaning they are well documented and maintained. Docker Official Repositories are very well documented, and they follow best practices and standards. Docker has its own team to maintain these repositories. This is essential in order to keep the repository clear, thus helping the user make the right choice of repository. To read more about Docker Official Repositories, take a look at https://docs.docker.com/docker-hub/official_repos/ We will be using the Java official repository. To find the official repository, go to hub.docker.com and search for java. You have to choose the one that says official. At the time of writing this, the Java image documentation says it will soon be deprecated in favor of the openjdk image. So the first line of our Dockerfile will look like this: FROM openjdk:8 As you can see, we have used version (or tag) 8 for our image. If you are wondering what type of operating system this image uses, take a look at the Dockerfile of this image, which you can get from the Docker Hub page. Docker images are usually tagged with the version of the software they are written for. That way, it is easy for users to pick from. The next step is creating a directory for our project where we will store our JAR artifact. Add this as your next line: RUN mkdir -p /opt/packt/geolocation This is a simple Unix command that creates the /opt/packt/geolocation directory. The –p flag instructs it to create the intermediate directories if they don't exist. Now let's create an instruction that will add the JAR file that was created in your local machine into the container at /opt/packt/geolocation. ADD target/geolocation-0.0.1-SNAPSHOT.jar /opt/packt/geolocation/ As you can see, we are picking up the uber JAR from target directory and dropping it into the /opt/packt/geolocation directory of the container. Take a look at the / at the end of the target path. That says that the JAR has to be copied into the directory. Before we can start the application, there is one thing we have to do, that is, expose the ports that we would like to be mapped to the Docker host ports. In our case, the in-memory Tomcat instance is running on port 8080. In order to be able to map port 8080 of our container to any port to our Docker host, we have to expose it first. For that, we will use the EXPOSE instruction. Add the following line to your Dockerfile: EXPOSE 8080 Now that we are ready to start the app, let's go ahead and tell Docker how to start a container for this image. For that, we will use the CMD instruction: CMD ["java", "-jar", "/opt/packt/geolocation/geolocation-0.0.1-SNAPSHOT.jar"] There are two things we have to note here. Once is the way we are starting the application and the other is how the command is broken down into comma-separated Strings. First, let's talk about how we start the application. You might be wondering why we haven't used the mvn spring-boot:run command to start the application. Keep in mind that this command will be executed inside the container, and our container does not have Maven installed, only OpenJDK 8. If you would like to use the maven command, take that as an exercise, and try to install Maven on your container and use the mvn command to start the application. Now that we know we have Java installed, we are issuing a very simple java –jar command to run the JAR. In fact, the Spring Boot Maven plugin internally issues the same command. The next thing is how the command has been broken down into comma-separated Strings. This is a standard that the CMD instruction follows. To keep it simple, keep in mind that for whatever command you would like to run upon running the container, just break it down into comma-separated Strings (in whitespaces). Your final Dockerfile should look something like this: FROM openjdk:8 RUN mkdir -p /opt/packt/geolocation ADD target/geolocation-0.0.1-SNAPSHOT.jar /opt/packt/geolocation/ EXPOSE 8080 CMD ["java", "-jar", "/opt/packt/geolocation/geolocation-0.0.1-SNAPSHOT.jar"] This Dockerfile is one of the simplest implementations. Dockerfiles can sometimes get bigger due to the fact that you need a lot of customizations to your image. In such cases, it is a good idea to break it down into multiple images that can be reused and maintained separately. There are some best practices to follow whenever you create your own Dockerfile and image. Though we haven't covered that here as it is out of the scope of this article, you still should take a look at and follow them. To learn more about the various Dockerfile instructions, go to https://docs.docker.com/engine/reference/builder/. Building your Docker image We created the Dockerfile, which will be used in this article to create an image for our microservice. If you are wondering why we would need an image, it is the only way we can ship our software to any system. Once you have your image created and uploaded to a common repository, it will be easier to pull your image from any location. Getting ready Before you jump right into it, it might be a good idea to get yourself familiar with some of the most commonly used Docker commands. In this article, we will use the build command. Take a look at this URL to understand the other commands: https://docs.docker.com/engine/reference/commandline/#/image-commands. After familiarizing yourself with the commands, open up a new terminal, and change your directory to the root of the geolocation project. Make sure your docker-machine instance is running. If it is not running, use the docker-machine start command to run your docker-machine instance: docker-machine start default If you have to configure your shell for the default Docker machine, go ahead and execute the following command: eval $(docker-machine env default) How to do it… From the terminal, issue the following docker build command: docker build –t packt/geolocation. We'll try to understand the command later. For now, let's see what happens after you issue the preceding command. You should see Docker downloading the openjdk image from Docker Hub. Once the image has been downloaded, you will see that Docker tries to validate each and every instruction provided in the Dockerfile. When the last instruction has been processed, you will see a message saying Successfully built. This says that your image has been successfully built. Now let's try to understand the command. There are three things to note here: The first thing is the docker build command itself. The docker build command is used to build a Docker image from a Dockerfile. It needs at least one input, which is usually the location of the Dockerfile. Dockerfiles can be renamed to something other than Dockerfile and can be referred to using the –f option of the docker build command. An instance of this being used is when teams have different Dockerfiles for different build environments, for example, using DockerfileDev for the dev environment, DockerfileStaging for the staging environment, and DockerfileProd for the production environment. It is still encouraged as best practice to use other Docker options in order to keep the same Dockerfile for all environments. The second thing is the –t option. The –t option takes the name of the repo and a tag. In our case, we have not mentioned the tag, so by default, it will pick up latest as the tag. If you look at the repo name, it is different from the official openjdk image name. It has two parts: packt and geolocation. It is always a good practice to put the Docker Hub account name followed by the actual image name as the name of your repo. For now, we will use packt as our account name, we will see how to create our own Docker Hub account and use that account name here. The third thing is the dot at the end. The dot operator says that the Dockerfile is located in the current directory, or the present working directory to be more precise. Let's go ahead and verify whether our image was created. In order to do that, issue the following command on your terminal: docker images The docker images command is used to list down all images available in your Docker host. After issuing the command, you should see something like this: As you can see, the newly built image is listed as packt/geolocation in your Docker host. The tag for this image is latest as we did not specify any. The image ID uniquely identifies your image. Note the size of the image. It is a few megabytes bigger than the openjdk:8 image. That is most probably because of the size of our executable uber JAR inside the container. Now that we know how to build an image using an existing Dockerfile, we are at the end of this article. This is just a very quick intro to the docker build command. There are more options that you can provide to the command, such as CPUs and memory. To learn more about the docker build command, take a look at this page: https://docs.docker.com/engine/reference/commandline/build/ Running your microservice as a Docker container We successfully created our Docker image in the Docker host. Keep in mind that if you are using Windows or Mac, your Docker host is the VirtualBox VM and not your local computer. In this article, we will look at how to spin off a container for the newly created image. Getting ready To spin off a new container for our packt/geolocation image, we will use the docker run command. This command is used to run any command inside your container, given the image. Open your terminal and go to the root of the geolocation project. If you have to start your Docker machine instance, do so using the docker-machine start command, and set the environment using the docker-machine env command. How to do it… Go ahead and issue the following command on your terminal: docker run packt/geolocation Right after you run the command, you should see something like this: Yay! We can see that our microservice is running as a Docker container. But wait—there is more to it. Let's see how we can access our microservice's in-memory Tomcat instance. Try to run a curl command to see if our app is up and running: Open a new terminal instance and execute the following cURL command in that shell: curl -H "Content-Type: application/json" -X POST -d '{"timestamp": 1468203975, "userId": "f1196aac-470e-11e6-beb8-9e71128cae77", "latitude": 41.803488, "longitude": -88.144040}' http://localhost:8080/geolocation Did you get an error message like this? curl: (7) Failed to connect to localhost port 8080: Connection refused Let's try to understand what happened here. Why would we get a connection refused error when our microservice logs clearly say that it is running on port 8080? Yes, you guessed it right: the microservice is not running on your local computer; it is actually running inside the container, which in turn is running inside your Docker host. Here, your Docker host is the VirtualBox VM called default. So we have to replace localhost with the IP of the container. But getting the IP of the container is not straightforward. That is the reason we are going to map port 8080 of the container to the same port on the VM. This mapping will make sure that any request made to port 8080 on the VM will be forwarded to port 8080 of the container. Now go to the shell that is currently running your container, and stop your container. Usually, Ctrl + C will do the job. After your container is stopped, issue the following command: docker run –p 8080:8080 packt/geolocation The –p option does the port mapping from Docker host to container. The port number to the left of the colon indicates the port number of the Docker host, and the port number to the right of the colon indicates that of the container. In our case, both of them are same. After you execute the previous command, you should see the same logs that you saw before. We are not done yet. We still have to find the IP that we have to use to hit our RESTful endpoint. The IP that we have to use is the IP of our Docker Machine VM. To find the IP of the docker-machine instance, execute the following command in a new terminal instance: docker-machine ip default. This should give you the IP of the VM. Let's say the IP that you received was 192.168.99.100. Now, replace localhost in your cURL command with this IP, and execute the cURL command again: curl -H "Content-Type: application/json" -X POST -d '{"timestamp": 1468203975, "userId": "f1196aac-470e-11e6-beb8-9e71128cae77", "latitude": 41.803488, "longitude": -88.144040}' http://192.168.99.100:8080/geolocation This should give you an output similar to the following (pretty-printed for readability): { "latitude": 41.803488, "longitude": -88.14404, "userId": "f1196aac-470e-11e6-beb8-9e71128cae77", "timestamp": 1468203975 } This confirms that you are able to access your microservice from the outside. Take a moment to understand how the port mapping is done. The following figure shows how your machine, VM, and container are orchestrated: This confirms that you are able to access your microservice from the outside. Summary We looked at an example of a geolocation tracker application to see how it can be broken down into smaller and manageable services. Next, we saw how to create the GeoLocationTracker service using the Spring Boot framework. Resources for Article: Further resources on this subject: Domain-Driven Design [article] Breaking into Microservices Architecture [article] A capability model for microservices [article]
Read more
  • 0
  • 0
  • 3868
article-image-testing-and-quality-control
Packt
04 Jan 2017
19 min read
Save for later

Testing and Quality Control

Packt
04 Jan 2017
19 min read
In this article by Pablo Solar Vilariño and Carlos Pérez Sánchez, the author of the book, PHP Microservices, we will see the following topics: (For more resources related to this topic, see here.) Test-driven development Behavior-driven development Acceptance test-driven development Tools Test-driven development Test-Driven Development (TDD) is part of Agile philosophy, and it appears to solve the common developer's problem that shows when an application is evolving and growing, and the code is getting sick, so the developers fix the problems to make it run but every single line that we add can be a new bug or it can even break other functions. Test-driven development is a learning technique that helps the developer to learn about the domain problem of the application they are going to build, doing it in an iterative, incremental, and constructivist way: Iterative because the technique always repeats the same process to get the value Incremental because for each iteration, we have more unit tests to be used Constructivist because it is possible to test all we are developing during the process straight away, so we can get immediate feedback Also, when we finish developing each unit test or iteration, we can forget it because it will be kept from now on throughout the entire development process, helping us to remember the domain problem through the unit test; this is a good approach for forgetful developers. It is very important to understand that TDD includes four things: analysis, design, development, and testing; in other words, doing TDD is understanding the domain problem and correctly analyzing the problem, designing the application well, developing well, and testing it. It needs to be clear; TDD is not just about implementing unit tests, it is the whole process of software development. TDD perfectly matches projects based on microservices because using microservices in a large project is dividing it into little microservices or functionalities, and it is like an aggrupation of little projects connected by a communication channel. The project size is independent of using TDD because in this technique, you divide each functionality into little examples, and to do this, it does not matter if the project is big or small, and even less when our project is divided by microservices. Also, microservices are still better than a monolithic project because the functionalities for the unit tests are organized in microservices, and it will help the developers to know where they can begin using TDD. How to do TDD? Doing TDD is not difficult; we just need to follow some steps and repeat them by improving our code and checking that we did not break anything. TDD involves the following steps: Write the unit test: It needs to be the simplest and clearest test possible, and once it is done, it has to fail; this is mandatory. If it does not fail, there is something that we are not doing properly. Run the tests: If it has errors (it fails), this is the moment to develop the minimum code to pass the test, just what is necessary, do not code additional things. Once you develop the minimum code to pass the test, run the test again (step two); if it passes, go to the next step, if not then fix it and run the test again. Improve the test: If you think it is possible to improve the code you wrote, do it and run the tests again (step two). If you think it is perfect then write a new unit test (step one). To do TDD, it is necessary to write the tests before implementing the function; if the tests are written after the implementation has started, it is not TDD; it is just testing. If we start implementing the application without testing and it is finished, or if we start creating unit tests during the process, we are doing the classic testing and we are not approaching the TDD benefits. Developing the functions without prior testing, the abstract idea of the domain problem in your mind can be wrong or may even be clear at the start but during the development process it can change or the concepts can be mixed. Writing the tests after that, we are checking if all the ideas in our main were correct after we finished the implementation, so probably we have to change some methods or even whole functionalities after spend time coding. Obviously, testing is always better than not testing, but doing TDD is still better than just classic testing. Why should I use TDD? TDD is the answer to questions such as: Where shall I begin? How can I do it? How can I write code that can be modified without breaking anything? How can I know what I have to implement? The goal is not to write many unit tests without sense but to design it properly following the requirements. In TDD, we do not to think about implementing functions, but we think about good examples of functions related with the domain problem in order to remove the ambiguity created by the domain problem. In other words, by doing TDD, we should reproduce a specific function or case of use in X examples until we get the necessary examples to describe the function or task without ambiguity or misinterpretations. TDD can be the best way to document your application. Using other methodologies of software development, we start thinking about how the architecture is going to be, what pattern is going to be used, how the communication between microservices is going to be, and so on, but what happens if once we have all this planned, we realize that this is not necessary? How much time is going to pass until we realize that? How much effort and money are we going to spend? TDD defines the architecture of our application by creating little examples in many iterations until we realize what the architecture is; the examples will slowly show us the steps to follow in order to define what the best structures, patterns, or tools to use are, avoiding expenditure of resources during the firsts stages of our application. This does not mean that we are working without an architecture; obviously, we have to know if our application is going to be a website or a mobile app and use a proper framework. What is going to be the interoperability in the application? In our case it will be an application based on microservices, so it will give us support to start creating the first unit tests. The architectures that we remove are the architectures on top of the architecture, in other words, the guidelines to develop an application as always. TDD will produce an architecture without ambiguity from unit testing. TDD is not cure-all: In other words, it does not give the same results to a senior developer as to a junior developer, but it is useful for the entire team. Let's look at some advantages of using TDD: Code reuse: Creates every functionality with only the necessary code to pass the tests in the second stage (Green) and allows you to see if there are more functions using the same code structure or parts of a specific function, so it helps you to reuse the previous code you wrote. Teamwork is easier: It allows you to be confident with your team colleagues. Some architects or senior developers do not trust developers with poor experience, and they need to check their code before committing the changes, creating a bottleneck at that point, so TDD helps to trust developers with less experience. Increases communication between team colleagues: The communication is more fluent, so the team share their knowledge about the project reflected on the unit tests. Avoid overdesigning application in the first stages: As we said before, doing TDD allows you to have an overview of the application little by little, avoiding the creation of useless structures or patterns in your project, which, maybe, you will trash in the future stages. Unit tests are the best documentation: The best way to give a good point of view of a specific functionality is reading its unit test. It will help to understand how it works instead of human words. Allows discovering more use cases in the design stage: In every test you have to create, you will understand how the functionality should work better and all the possible stages that a functionality can have. Increases the feeling of a job well done: In every commit of your code, you will have the feeling that it was done properly because the rest of the unit tests passes without errors, so you will not be worried about other broken functionalities. Increases the software quality: During the step of refactoring, we spend our efforts on making the code more efficient and maintainable, checking that the whole project still works properly after the changes. TDD algorithm The technical concepts and steps to follow the TDD algorithm are easy and clear, and the proper way to make it happen improves by practicing it. There are only three steps, called red, green, and refactor: Red – Writing the unit tests It is possible to write a test even when the code is not written yet; you just need to think about whether it is possible to write a specification before implementing it. So, in this first step you should consider that the unit test you start writing is not like a unit test, but it is like an example or specification of the functionality. In TDD, this first example or specification is not immovable; in other words, the unit test can be modified in the future. Before starting to write the first unit test, it is necessary to think about how the Software Under Test (SUT) is going to be. We need to think about how the SUT code is going to be and how we would check that it works they way we want it to. The way that TDD works drives us to firstly design what is more comfortable and clear if it fits the requirements. Green – Make the code work Once the example is written, we have to code the minimum to make it pass the test; in other words, set the unit test to green. It does not matter if the code is ugly and not optimized, it will be our task in the next step and iterations. In this step, the important thing is only to write the necessary code for the requirements without unnecessary things. It does not mean writing without thinking about the functionality, but thinking about it to be efficient. It looks easy but you will realize that you will write extra code the first time. If you concentrate on this step, new questions will appear about the SUT behavior with different entries, but you should be strong and avoid writing extra code about other functionalities related to the current one. Instead of coding them, take notes to convert them into functionalities in the next iterations. Refactor – Eliminate redundancy Refactoring is not the same as rewriting code. You should be able to change the design without changing the behavior. In this step, you should remove the duplicity in your code and check if the code matches the principles of good practices, thinking about the efficiency, clarity, and future maintainability of the code. This part depends on the experience of each developer. The key to good refactoring is making it in small steps To refactor a functionality, the best way is to change a little part and then execute all the available tests; if they pass, continue with another little part, until you are happy with the obtained result. Behavior-driven development Behavior-Driven Development (BDD) is a process that broadens the TDD technique and mixes it with other design ideas and business analyses provided to the developers, in order to improve the software development. In BDD, we test the scenarios and classes’ behavior in order to meet the scenarios, which can be composed by many classes. It is very useful to use a DSL in order to have a common language to be used by the customer, project owner, business analyst, or developers. The goal is to have a ubiquitous language. What is BDD? As we said before, BDD is an AGILE technique based on TDD and ATDD, promoting the collaboration between the entire team of a project. The goal of BDD is that the entire team understands what the customer wants, and the customer knows what the rest of the team understood from their specifications. Most of the times, when a project starts, the developers don't have the same point of view as the customer, and during the development process the customer realizes that, maybe, they did not explain it or the developer did not understand it properly, so it adds more time to changing the code to meet the customer's needs. So, BDD is writing test cases in human language, using rules, or in a ubiquitous language, so the customer and developers can understand it. It also defines a DSL for the tests. How does it work? It is necessary to define the features as user stories (we will explain what this is in the ATDD section of this article) and their acceptance criteria. Once the user story is defined, we have to focus on the possible scenarios, which describe the project behavior for a concrete user or a situation using DSL. The steps are: Given [context], When [event occurs], Then [Outcome]. To sum up, the defined scenario for a user story gives the acceptance criteria to check if the feature is done. Acceptance Test-Driven Development Perhaps, the most important methodology in a project is the Acceptance Test-Driven Development (ATDD) or Story Test-Driven Development (STDD); it is TDD but on a different level. The acceptance (or customer) tests are the written criteria for a project meeting the business requirements that the customer demands. They are examples (like the examples in TDD) written by the project owner. It is the start of development for each iteration, the bridge between Scrum and agile development. In ATDD, we start the implementation of our project in a way different from the traditional methodologies. The business requirements written in human language are replaced by executables agreed upon by some team members and also the customer. It is not about replacing the whole documentation, but only a part of the requirements. The advantages of using ATDD are the following: Real examples and a common language for the entire team to understand the domain It allows identifying the domain rules properly It is possible to know if a user story is finished in each iteration The workflow works from the first steps The development does not start until the tests are defined and accepted by the team ATDD algorithm The algorithm of ATDD is like that of TDD but reaches more people than only the developers; in other words, doing ATDD, the tests of each story are written in a meeting that includes the project owners, developers, and QA technicians because the entire team must understand what is necessary to do and why it is necessary, so they can see if it is what the code should do. The ATDD cycle is depicted in the following diagram: Discuss The starting point of the ATDD algorithm is the discussion. In this first step, the business has a meeting with the customer to clarify how the application should work, and the analyst should create the user stories from that conversation. Also, they should be able to explain the conditions of satisfaction of every user story in order to be translated into examples. By the end of the meeting, the examples should be clear and concise, so we can get a list of examples of user stories in order to cover all the needs of the customer, reviewed and understood for him. Also, the entire team will have a project overview in order to understand the business value of the user story, and in case the user story is too big, it could be divided into little user stories, getting the first one for the first iteration of this process. Distill High-level acceptance tests are written by the customer and the development team. In this step, the writing of the test cases that we got from the examples in the discussion step begins, and the entire team can take part in the discussion and help clarify the information or specify the real needs of that. The tests should cover all the examples that were discovered in the discussion step, and extra tests could be added during this process bit by bit till we understand the functionality better. At the end of this step, we will obtain the necessary tests written in human language, so the entire team (including the customer) can understand what they are going to do in the next step. These tests can be used like a documentation. Develop In this step, the development of acceptance test cases is begun by the development team and the project owner. The methodology to follow in this step is the same as TDD, the developers should create a test and watch it fail (Red) and then develop the minimum amount of lines to pass (Green). Once the acceptance tests are green, this should be verified and tested to be ready to be delivered. During this process, the developers may find new scenarios that need to be added into the tests or even if it needs a large amount of work, it could be pushed to the user story. At the end of this step, we will have software that passes the acceptance tests and maybe more comprehensive tests. Demo The created functionality is shown by running the acceptance test cases and manually exploring the features of the new functionality. After the demonstration, the team discusses whether the user story was done properly and it meets the product owner's needs and decides if it can continue with the next story. Tools After knowing more about TDD and BDD, it is time to explain a few tools you can use in your development workflow. There are a lot of tools available, but we will only explain the most used ones. Composer Composer is a PHP tool used to manage software dependencies. You only need to declare the libraries needed by your project and the composer will manage them, installing and updating when necessary. This tool has only a few requirements: if you have PHP 5.3.2+, you are ready to go. In the case of a missing requirement, the composer will warn you. You could install this dependency manager on your development machine, but since we are using Docker, we are going to install it directly on our PHP-FPM containers. The installation of composer in Docker is very easy; you only need to add the following rule to the Dockerfile: RUN curl -sS https://getcomposer.org/installer | php -- --install-"dir=/usr/bin/ --filename=composer PHPUnit Another tool we need for our project is PHPUnit, a unit test framework. As before, we will be adding this tool to our PHP-FPM containers to keep our development machine clean. If you are wondering why we are not installing anything on our development machine except for Docker, the response is clear. Having everything in the containers will help you avoid any conflict with other projects and gives you the flexibility of changing versions without being too worried. Add the following RUN command to your PHP-FPM Dockerfile, and you will have the latest PHPUnit version installed and ready to use: RUN curl -sSL https://phar.phpunit.de/phpunit.phar -o "/usr/bin/phpunit && chmod +x /usr/bin/phpunit Now that we have all our requirements too, it is time to install our PHP framework and start doing some TDD stuff. Later, we will continue updating our Docker environment with new tools. We choose Lumen for our example. Please feel free to adapt all the examples to your favorite framework. Our source code will be living inside our containers, but at this point of development, we do not want immutable containers. We want every change we make to our code to be available instantaneously in our containers, so we will be using a container as a storage volume. To create a container with our source and use it as a storage volume, we only need to edit our docker-compose.yml and create one source container per each microservice, as follows: source_battle: image: nginx:stable volumes: - ../source/battle:/var/www/html command: "true" The above piece of code creates a container image named source_battle, and it stores our battle source (located at ../source/battle from the docker-compose.yml current path). Once we have our source container available, we can edit each one of our services and assign a volume. For instance, we can add the following line in our microservice_battle_fpm and microservice_battle_nginx container descriptions: volumes_from: - source_battle Our battle source will be available in our source container in the path, /var/www/html, and the remaining step to install Lumen is to do a simple composer execution. First, you need to be sure that your infrastructure is up with a simple command, as follows: $ docker-compose up The preceding command spins up our containers and outputs the log to the standard IO. Now that we are sure that everything is up and running, we need to enter in our PHP-FPM containers and install Lumen. If you need to know the names assigned to each one of your containers, you can do a $ docker ps and copy the container name. As an example, we are going to enter the battle PHP-FPM container with the following command: $ docker exec -it docker_microservice_battle_fpm_1 /bin/bash The preceding command opens an interactive shell in your container, so you can do anything you want; let's install Lumen with a single command: # cd /var/www/html && composer create-project --prefer-dist "laravel/lumen . Repeat the preceding commands for each one of your microservices. Now, you have everything ready to start doing Unit tests and coding your application. Summary In this article, you learned about test-driven development, behavior-driven development, acceptance test-driven development, and PHPUnit. Resources for Article: Further resources on this subject: Running Simpletest and PHPUnit [Article] Understanding PHP basics [Article] The Multi-Table Query Generator using phpMyAdmin and MySQL [Article]
Read more
  • 0
  • 0
  • 1858

article-image-examining-encodingjson-package-go
Packt
28 Dec 2016
13 min read
Save for later

Examining the encoding/json Package with Go

Packt
28 Dec 2016
13 min read
In this article by Nic Jackson, author of the book Building Microservices with Go, we will examine the encoding/json package to see just how easy Go makes it for us to use JSON objects for our requests and responses. (For more resources related to this topic, see here.) Reading and writing JSON Thanks to the encoding/json package, which is built into the standard library, encoding and decoding JSON to and from Go types is both fast and easy. It implements the simplistic Marshal and Unmarshal functions; however, if we need them, the package also provides Encoder and Decoder types, which allow us greater control when reading and writing streams of JSON data. In this section, we are going to examine both of these approaches, but first let's take a look at how simple it is to convert a standard Go struct into its corresponding JSON string. Marshalling Go structs to JSON To encode JSON data, the encoding/json package provides the Marshal function, which has the following signature: func Marshal(v interface{}) ([]byte, error) This function takes one parameter, which is of the interface type, so that's pretty much any object you can think of, since interface represents any type in Go. It returns a tuple of ([]byte, error). You will see this return style quite frequently in Go. Some languages implement a try...catch approach, which encourages an error to be thrown when an operation cannot be performed. Go suggests the (return type, error) pattern, where the error is nil when an operation succeeds. In Go, unhanded errors are a bad thing, and while the language does implement the panic and recover functions, which resemble exception handling in other languages, the situations in which you should use them are quite different (The Go Programming Language, Donovan and Kernighan). In Go, panic causes normal execution to stop, and all deferred function calls in the Go routine are executed; the program will then crash with a log message. It is generally used for unexpected errors that indicate a bug in the code, and good, robust Go code will attempt to handle these runtime exceptions and return a detailed error object back to the calling function. This pattern is exactly what is implemented with the Marshal function. In case Marshal cannot create a JSON-encoded byte array from the given object, which could be due to a runtime panic, then this is captured and an error object detailing the problem is returned to the caller. Let's try this out, expanding on our existing example. Instead of simply printing a string from our handler, let's create a simple struct for the response and return that: 10 type helloWorldResponse struct { 11 Message string 12 } In our handler, we will create an instance of this object, set the message, and then use the Marshal function to encode it to a string before returning. Let's see what that will look like: 23 func helloWorldHandler(w http.ResponseWriter, r *http.Request) { 24 response := helloWorldResponse{Message: "HelloWorld"} 25 data, err := json.Marshal(response) 26 if err != nil { 27 panic("Ooops") 28 } 29 30 fmt.Fprint(w, string(data)) 31 } Now when we rerun our program and refresh our browser, we'll see the following output rendered in valid JSON: {"Message":"Hello World"} This is awesome, but the default behavior of Marshal is to take the literal name of the field and use that as the field in the JSON output. What if I prefer to use camel case and would rather see message—could we just rename the field in our struct message? Unfortunately, we can't because in Go, lowercase properties are not exported. Marshal will ignore these and will not include them in the output. All is not lost: the encoding/json package implements struct field attributes, which allow us to change the output for the property to anything we choose. The example code is as follows: 10 type helloWorldResponse struct { 11 Message string `json:"message"` 12 } Using the struct field's tags, we can have greater control over how the output will look. In the preceding example, when we marshal this struct, the output from our server would be the following: {"message":"Hello World"} This is exactly what we want, but we can use field tags to control the output even further. We can convert object types and even ignore a field altogether if we need to: struct helloWorldResponse { // change the output field to be "message" Message string `json:"message"` // do not output this field Author string `json:"-"` // do not output the field if the value is empty Date string `json:",omitempty"` // convert output to a string and rename "id" Id int `json:"id, string"` } The channel, complex types, and functions cannot be encoded in JSON. Attempting to encode these types will result in an UnsupportedTypeError being returned by the Marshal function. It also can't represent cyclic data structures, so if your stuct contains a circular reference, then Marshal will result in an infinite recursion, which is never a good thing for a web request. If we want to export our JSON pretty formatted with indentation, we can use the MarshallIndent function, which allows you to pass an additional string parameter to specify what you would like the indent to be—two spaces, not a tab, right? func MarshalIndent(v interface{}, prefix, indent string) ([]byte, error) The astute reader might have noticed that we are decoding our struct into a byte array and then writing that to the response stream. This does not seem to be particularly efficient, and in fact, it is not. Go provides encoders and decoders, which can write directly to a stream. Since we already have a stream with the ResponseWriter interface, let's do just that. Before we do so, I think we need to look at the ResponseWriter interface a little to see what is going on there. ResponseWriter is an interface that defines three methods: // Returns the map of headers which will be sent by the // WriteHeader method. Header() // Writes the data to the connection. If WriteHeader has not // already been called then Write will call // WriteHeader(http.StatusOK). Write([]byte) (int, error) // Sends an HTTP response header with the status code. WriteHeader(int)   If we have a ResponseWriter, how can we use this with fmt.Fprint(w io.Writer, a ...interface{})? This method requires a Writer interface as a parameter, and we have a ResponseWriter. If we look at the signature for Writer, we can see that it is the following: Write(p []byte) (n int, err error) Because the ResponseWriter interface implements this method, it also satisfies the Writer interface; therefore, any object that implements ResponseWriter can be passed to any function that expects Writer. Amazing! Go rocks—but we don't have an answer to our question: is there any better way to send our data to the output stream without marshalling to a temporary string before we return it? The encoding/json package has a function called NewEncoder. This returns an Encoder object, which can be used to write JSON straight to an open writer, and guess what—we have one of those: func NewEncoder(w io.Writer) *Encoder So instead of storing the output of Marshal into a byte array, we can write it straight to the HTTP response, as shown in the following code: func helloWorldHandler(w http.ResponseWriter, r *http.Request) { response := HelloWorldResponse{Message: "HelloWorld"} encoder := json.NewEncoder(w) encoder.Encode(&response) } We will look at benchmarking in a later chapter, but to see why this is important, here's a simple benchmark to check the two methods against each other; have a look at the output: go test -v -run="none" -bench=. -benchtime="5s" -benchmem testing: warning: no tests to run PASS BenchmarkHelloHandlerVariable 10000000 1211 ns/op 248 B/op 5 allocs/op BenchmarkHelloHandlerEncoder 10000000 662 ns/op 8 B/op 1 allocs/op ok github.com/nicholasjackson/building-microservices-in-go/chapter1/bench 20.650s Using the Encoder rather than marshalling to a byte array is nearly 50% faster. We are dealing with nanoseconds here, so that time may seem irrelevant, but it isn't; this was two lines of code. If you have that level of inefficiency throughout the rest of your code, your application will run slower, you will need more hardware to satisfy the load, and that will cost you money. There is nothing clever in the differences between the two methods—all we have done is understood how the standard packages work and chosen the correct option for our requirements. That is not performance tuning, that is understanding the framework. Unmarshalling JSON to Go structs Now that we have learned how we can send JSON back to the client, what if we need to read input before returning the output? We could use URL parameters, and we will see what that is all about in the next chapter, but usually, you will need more complex data structures, which include the service to accept JSON as part of an HTTP POST request. If we apply techniques similar to those we learned in the previous section (to write JSON), reading JSON is just as easy. To decode JSON into a stuct, the encoding/json package provides us with the Unmarshal function: func Unmarshal(data []byte, v interface{}) error The Unmarshal function works in the opposite way to Marshal: it allocates maps, slices, and pointers as required. Incoming object keys are matched using either the struct field name or its tag and will work with a case-insensitive match; however, an exact match is preferred. Like Marshal, Unmarshal will only set exported struct fields: those that start with an upper case letter. We start by adding a new struct to represent the request, while Unmarshal can decode the JSON into an interface{} array, which would be of one of the following types: map[string]interface{} // for JSON objects []interface{} // for JSON arrays Which type it is depends on whether our JSON is an object or an array. In my opinion, it is much clearer to the readers of our code if we explicitly state what we are expecting as a request. We can also save ourselves work by not having to manually cast the data when we come to use it. Remember two things: You do not write code for the compiler; you write code for humans to understand You will spend more time reading code than you do writing it We are going to do ourselves a favor by taking into account these two points and creating a simple struct to represent our request, which will look like this: 14 type helloWorldRequest struct { 15 Name string `json:"name"` 16 } Again, we are going to use struct field tags because while we could let Unmarshal do case-insensitive matching so that {"name": "World} would correctly unmarshal into the struct the same as {"Name": "World"}, when we specify a tag, we are being explicit about the request form, and that is a good thing. In terms of speed and performance, it is also about 10% faster, and remember: performance matters. To access the JSON sent with the request, we need to take a look at the http.Request object passed to our handler. The following listing does not show all the methods in the request, just the ones we are going to be immediately dealing with. For the full documentation, I recommend checking out the docs at https://godoc.org/net/http#Request. type Requests struct { … // Method specifies the HTTP method (GET, POST, PUT, etc.). Method string // Header contains the request header fields received by the server. The type Header is a link to map[string] []string. Header Header // Body is the request's body. Body io.ReadCloser … } The JSON that has been sent with the request is accessible in the Body field. The Body field implements the io.ReadCloser interface as a stream and does not return []byte or string data. If we need the data contained in the body, we can simply read it into a byte array, like the following example: 30 body, err := ioutil.ReadAll(r.Body) 31 if err != nil { 32 http.Error(w, "Bad request", http.StatusBadRequest) 33 return 34 } Here is something we'll need to remember: we are not calling Body.Close(); if we were making a call with a client, we would need to do this as it is not automatically closed; however, when used in a ServeHTTP handler, the server automatically closes the request stream. To see how this all works inside our handler, we can look at the following handler: 28 func helloWorldHandler(w http.ResponseWriter, r *http.Request) { 29 30 body, err := ioutil.ReadAll(r.Body) 31 if err != nil { 32 http.Error(w, "Bad request", http.StatusBadRequest) 33 return 34 } 35 36 var request HelloWorldRequest 37 err = json.Unmarshal(body, &request) 38 if err != nil { 39 http.Error(w, "Bad request", http.StatusBadRequest) 40 return 41 } 42 43 response := HelloWorldResponse{Message: "Hello " + request.Name} 44 45 encoder := json.NewEncoder(w) 46 encoder.Encode(response) 47 } Let's run this example and see how it works; to test it, we can simply use curl to send a request to the running server. If you feel more comfortable using a GUI tool, then Postman, which is available for the Google Chrome browser, will work just fine. Otherwise, feel free to use your preferred tool. $ curl localhost:8080/helloworld -d '{"name":"Nic"}' You should see the following response: {"message":"Hello Nic"} What do you think will happen if you do not include a body with your request? $ curl localhost:8080/helloworld If you guessed correctly that you would get an "HTTP status 400 Bad Request" error, then you win a prize. The following error replies to the request with the given message and status code: func Error(w ResponseWriter, error string, code int) Once we have sent this, we need to return, stopping further execution of the function as this does not close the ResponseWriter and return flow to the calling function automatically. You might think you are done, but have a go and see whether you can improve the performance of the handler. Think about the things we were talking about when marshaling JSON. Got it? Well, if not, here is the answer: again, all we are doing is using Decoder, which is the opposite of the Encoder function we used when writing JSON, as shown in the following code example. This nets an instant 33% performance increase, and with less code, too. 27 func helloWorldHandler(w http.ResponseWriter, r *http.Request) { 28 29 var request HelloWorldRequest 30 decoder := json.NewDecoder(r.Body) 31 32 err := decoder.Decode(&request) 33 if err != nil { 34 http.Error(w, "Bad request", http.StatusBadRequest) 35 return 36 } 37 38 response := HelloWorldResponse{Message: "Hello " + request.Name} 39 40 encoder := json.NewEncoder(w) 41 encoder.Encode(response) 42 } Now that you can see just how easy it is to encode and decode JSON with Go, I would recommend taking 5 minutes to spend some time digging through the documentation for the encoding/json package as there is a whole lot more than you can do with it: https://golang.org/pkg/encoding/json/ Summary In this article, we looked at encoding and decoding data using the encoding/json package. Resources for Article: Further resources on this subject: Microservices – Brave New World [article] A capability model for microservices [article] Breaking into Microservices Architecture [article]
Read more
  • 0
  • 0
  • 2665

article-image-capability-model-microservices
Packt
17 Jun 2016
19 min read
Save for later

A capability model for microservices

Packt
17 Jun 2016
19 min read
In this article by Rajesh RV, the author of Spring Microservices, you will learn aboutthe concepts of microservices. More than sticking to definitions, it is better to understand microservices by examining some common characteristics of microservices that are seen across many successful microservices implementations. Spring Boot is an ideal framework to implement microservices. In this article, we will examine how to implement microservices using Spring Boot with an example use case. Beyond services, we will have to be aware of the challenges around microservices implementation. This article will also talk about some of the common challenges around microservices. A successful microservices implementation has to have some set of common capabilities. In this article, we will establish a microservices capability model that can be used in a technology-neutral framework to implement large-scale microservices. What are microservices? Microservices is an architecture style used by many organizations today as a game changer to achieve a high degree of agility, speed of delivery, and scale. Microservices give us a way to develop more physically separated modular applications. Microservices are not invented. Many organizations, such as Netflix, Amazon, and eBay, successfully used the divide-and-conquer technique to functionally partition their monolithic applications into smaller atomic units, and each performs a single function. These organizations solved a number of prevailing issues they experienced with their monolithic application. Following the success of these organizations, many other organizations started adopting this as a common pattern to refactor their monolithic applications. Later, evangelists termed this pattern microservices architecture. Microservices originated from the idea of Hexagonal Architecture coined by Alister Cockburn. Hexagonal Architecture is also known as thePorts and Adapters pattern. Microservices is an architectural style or an approach to building IT systems as aset of business capabilities that are autonomous, self-contained, and loosely coupled. The preceding diagram depicts a traditional N-tier application architecture having a presentation layer, business layer, and database layer. Modules A, B, and C represents three different business capabilities. The layers in the diagram represent a separation of architecture concerns. Each layer holds all three business capabilities pertaining to this layer. The presentation layer has the web components of all three modules, the business layer has the business components of all the three modules, and the database layer hosts tables of all the three modules. In most cases, layers are physically spreadable, whereas modules within a layer are hardwired. Let's now examine a microservices-based architecture, as follows: As we can note in the diagram, the boundaries are inverted in the microservices architecture. Each vertical slice represents a microservice. Each microservice has its own presentation layer, business layer, and database layer. Microservices are aligned toward business capabilities. By doing so, changes to one microservice do not impact others. There is no standard for communication or transport mechanisms for microservices. In general, microservices communicate with each other using widely adopted lightweight protocols such as HTTP and REST or messaging protocols such as JMS or AMQP. In specific cases, one might choose more optimized communication protocols such as Thrift, ZeroMQ, Protocol Buffers, or Avro. As microservices are more aligned to the business capabilities and have independently manageable lifecycles, they are the ideal choice for enterprises embarking on DevOps and cloud. DevOps and cloud are two other facets of microservices. Microservices are self-contained, independently deployable, and autonomous services that take full responsibility of a business capability and its execution. They bundle all dependencies, including library dependencies and execution environments such as web servers and containers or virtual machines that abstract physical resources. These self-contained services assume single responsibility and are well enclosed with in a bounded context. Microservices – The honeycomb analogy The honeycomb is an ideal analogy to represent the evolutionary microservices architecture. In the real world, bees build a honeycomb by aligning hexagonal wax cells. They start small, using different materials to build the cells. Construction is based on what is available at the time of building. Repetitive cells form a pattern and result in a strong fabric structure. Each cell in the honeycomb is independent but also integrated with other cells. By adding new cells, the honeycomb grows organically to a big solid structure. The content inside each cell is abstracted and is not visible outside. Damage to one cell does not damage other cells, and bees can reconstruct these cells without impacting the overall honeycomb. Characteristics of microservices The microservices definition discussed at the beginning of this article is arbitrary. Evangelists and practitioners have strong but sometimes different opinions on microservices. There is no single, concrete, and universally accepted definition for microservices. However, all successful microservices implementations exhibit a number of common characteristics. Some of these characteristics are explained as follows: Since microservices are more or less similar to a flavor of SOA, many of the service characteristics of SOA are applicable to microservices, as well. In the microservices world, services are first-class citizens. Microservices expose service endpoints as APIs and abstract all their realization details. The APIs could be synchronous or asynchronous. HTTP/REST is the popular choice for APIs. As microservices are autonomous and abstract everything behind service APIs, it is possible to have different architectures for different microservices. The internal implementation logic, architecture, and technologies, including programming language, database, quality of service mechanisms,and so on, are completely hidden behind the service API. Well-designed microservices are aligned to a single business capability, so they perform only one function. As a result, one of the common characteristics we see in most of the implementations are microservices with smaller footprints. Most of the microservices implementations are automated to the maximum extent possible, from development to production. Most large-scale microservices implementations have a supporting ecosystem in place. The ecosystem's capabilities include DevOps processes, centralized log management, service registry, API gateways, extensive monitoring, service routing and flow control mechanisms, and so on. Successful microservices implementations encapsulate logic and data within the service. This results in two unconventional situations: a distributed data and logic and decentralized governance. A microservice example The Customer profile microservice exampleexplained here demonstrates the implementation of microservice and interaction between different microservices. In this example, two microservices, Customer Profile and Customer Notification, will be developed. As shown in the diagram, the Customer Profile microservice exposes methods to create, read, update, and delete a customer and a registration service to register a customer. The registration process applies a certain business logic, saves the customer profile, and sends a message to the CustomerNotification microservice. The CustomerNotification microservice accepts the message send by the registration service and sends an e-mail message to the customer using an SMTP server. Asynchronous messaging is used to integrate CustomerProfile with the CustomerNotification service. The customer microservices class domain model diagram is as shown here: Implementing this Customer Profile microservice is not a big deal. The Spring framework, together with Spring Boot, provides all the necessary capabilities to implement this microservice without much hassle. The key is CustomerController in the diagram, which exposes the REST endpoint for our microservice. It is also possible to use HATEOAS to explore the repository's REST services directly using the @RepositoryRestResource annotation. The following code sample shows the Spring Boot main class called Application and theREST endpoint definition for theregistration of a new customer: @SpringBootApplication public class Application { public static void main(String[] args) { SpringApplication.run(Application.class, args);  } }   @RestController class CustomerController{ //other code here @RequestMapping( path="/register", method = RequestMethod.POST) Customer register(@RequestBody Customer customer){ returncustomerComponent.register(customer); } } CustomerControllerinvokes a component class, CustomerComponent. The component class/bean handles all the business logic. CustomerRepository is a Spring data JPA repository defined to handlethe persistence of the Customer entity. The whole application will then be deployed as a Spring Boot application by building a standalone jar rather than using the conventional war file. Spring Boot encapsulates the server runtime along with the fat jar it produces. By default, it is an instance of the Tomcat server. CustomerComponent, in addition to calling the CustomerRepository class, sends a message to the RabbitMQ queue, where the CustomerNotification component is listening. This can be easily achieved in Spring using the RabbitMessagingTemplate class as shown in the following Sender implementation: @Component class CustomerComponent { //other code here   Customer register(Customer customer){ customerRespository.save(customer); sender.send(customer.getEmail()); return customer; } }   @Component @Lazy class Sender { RabbitMessagingTemplate template;   @Autowired Sender(RabbitMessagingTemplate template){ this.template = template; }   @Bean Queue queue() { return new Queue("CustomerQ", false); }   public void send(String message){ template.convertAndSend("CustomerQ", message); } } The receiver on the other sideconsumes the message using RabbitListener and sends out an e-mail using theJavaMailSender component. Execute the following code: @Component class Receiver { @Autowired private  JavaMailSenderjavaMailService;   @Bean Queue queue() { return new Queue("CustomerQ", false); }   @RabbitListener(queues = "CustomerQ") public void processMessage(String email) { System.out.println(email); SimpleMailMessagemailMessage=new SimpleMailMessage(); mailMessage.setTo(email); mailMessage.setSubject("Registration"); mailMessage.setText("Successfully Registered"); javaMailService.send(mailMessage);       }   } In this case,CustomerNotification isour secondSpring Boot microservice. In this case, instead of the REST endpoint, it only exposes a message listener end point. Microservices challenges In the previous section,you learned about the right design decisions to be made and the trade-offs to be applied. In this section, we will review some of the challenges with microservices. Take a look at the following list: Data islands: Microservices abstract their own local transactional store, which is used for their own transactional purposes. The type of store and the data structure will be optimized for the services offered by the microservice. This can lead to data islands and, hence, challenges around aggregating data from different transactional stores to derive meaningful information. Logging and monitoring: Log files are a good piece of information for analysis and debugging. As each microservice is deployed independently, they emit separate logs, maybe to a local disk. This will result in fragmented logs. When we scale services across multiple machines, each service instance would produce separate log files. This makes it extremely difficult to debug and understand the behavior of the services through log mining. Dependency management: Dependency management is one of the key issues in large microservices deployments. How do we ensure the chattiness between services is manageable?How do we identify and reduce the impact of a change? How do we know whether all the dependent services are up and running? How will the service behave if one of the dependent services is not available? Organization's culture: One of the biggest challenges in microservices implementation is the organization's culture. An organization following waterfall development or heavyweight release management processes with infrequent release cycles is a challenge for microservices development. Insufficient automation is also a challenge for microservices deployments. Governance challenges: Microservices impose decentralized governance, and this is quite in contrast to the traditional SOA governance. Organizations may find it hard to come up with this change, and this could negatively impact microservices development. How can we know who is consuming service? How do we ensure service reuse? How do we define which services are available in the organization? How do we ensure that the enterprise polices are enforced? Operation overheads: Microservices deployments generally increases the number of deployable units and virtual machines (or containers). This adds significant management overheads and cost of operations.With a single application, a dedicated number of containers or virtual machines in an on-premises data center may not make much sense unless the business benefit is high. With many microservices, the number of Configurable Items (CIs) is too high, and the number of servers in which these CIs are deployed might also be unpredictable. This makes it extremely difficult to manage data in a traditional Configuration Management Database (CMDB). Testing microservices: Microservices also pose a challenge for the testability of services. In order to achieve full service functionality, one service may rely on another service, and this, in turn, may rely on another service, either synchronously or asynchronously. The issue is how we test an end-to-end service to evaluate its behavior. Dependent services may or may not be available at the time of testing. Infrastructure provisioning: As briefly touched upon under operation overheads, manual deployment can severely challenge microservices rollouts. If a deployment has manual elements, the deployer or operational administrators should know the running topology, manually reroute traffic, and then deploy the application one by one until all the services are upgraded. With many server instances running, this could lead to significant operational overheads. Moreover, the chance of error is high in this manual approach. Beyond just services– The microservices capability model Microservice are not as simple as the Customer Profile implementation we discussedearlier. This is specifically true when deploying hundreds or thousands of services. In many cases, an improper microservices implementation may lead to a number of challenges, as mentioned before.Any successful Internet-scale microservices deployment requires a number of additional surrounding capabilities. The following diagram depicts the microservices capability model: The capability model is broadly classified in to four areas, as follows: Core capabilities, which are part of the microservices themselves Supporting capabilities, which are software solutions supporting core microservice implementations Infrastructure capabilities, which are infrastructure-level expectations for a successful microservices implementation Governance capabilities, which are more of process, people, and reference information Core capabilities The core capabilities are explained here: Service listeners (HTTP/Messaging): If microservices are enabled for HTTP-based service endpoints, then the HTTP listener will be embedded within the microservices, thereby eliminating the need to have any external application server requirement. The HTTP listener will be started at the time of the application startup. If the microservice is based on asynchronous communication, then instead of an HTTP listener, a message listener will be stated. Optionally, other protocols could also be considered. There may not be any listeners if the microservices is a scheduled service. Spring Boot and Spring Cloud Streams provide this capability. Storage capability: Microservices have storage mechanisms to store state or transactional data pertaining to the business capability. This is optional, depending on the capabilities that are implemented. The storage could be either a physical storage (RDBMS,such as MySQL, and NoSQL,such as Hadoop, Cassandra, Neo4J, Elasticsearch,and so on), or it could be an in-memory store (cache,such as Ehcache and Data grids,such as Hazelcast, Infinispan,and so on). Business capability definition: This is the core of microservices, in which the business logic is implemented. This could be implemented in any applicable language, such as Java, Scala, Conjure, Erlang, and so on. All required business logic to fulfil the function is embedded within the microservices itself. Event sourcing: Microservices send out state changes to the external world without really worrying about the targeted consumers of these events. They could be consumed by other microservices, supporting services such as audit by replication, external applications,and so on. This will allow other microservices and applications to respond to state changes. Service endpoints and communication protocols: This defines the APIs for external consumers to consume. These could be synchronous endpoints or asynchronous endpoints. Synchronous endpoints could be based on REST/JSON or other protocols such as Avro, Thrift, protocol buffers, and so on. Asynchronous endpoints will be through Spring Cloud Streams backed by RabbitMQ or any other messaging servers or other messaging style implementations, such as Zero MQ. The API gateway: The API gateway provides a level of indirection by either proxying service endpoints or composing multiple service endpoints. The API gateway is also useful for policy enforcements. It may also provide real-time load balancing capabilities. There are many API gateways available in the market. Spring Cloud Zuul, Mashery, Apigee, and 3 Scale are some examples of API gateway providers. User interfaces: Generally, user interfaces are also part of microservices for users to interact with the business capabilities realized by the microservices. These could be implemented in any technology and is channel and device agnostic. Infrastructure capabilities Certain infrastructure capabilities are required for a successful deployment and to manage large-scale microservices. When deploying microservices at scale, not having proper infrastructure capabilities can be challenging and can lead to failures. Cloud: Microservices implementation is difficult in a traditional data center environment with a long lead time to provision infrastructures. Even a large number of infrastructure dedicated per microservice may not be very cost effective. Managing them internally in a data center may increase the cost of ownership and of operations. A cloud-like infrastructure is better for microservices deployment. Containers or virtual machines: Managing large physical machines is not cost effective and is also hard to manage. With physical machines, it is also hard to handle automatic fault tolerance. Virtualization is adopted by many organizations because of its ability to provide an optimal use of physical resources, and it provides resource isolation. It also reduces the overheads in managing large physical infrastructure components. Containers are the next generation of virtual machines. VMWare, Citrix,and so on provide virtual machine technologies. Docker, Drawbridge, Rocket, and LXD are some containerizing technologies. Cluster control and provisioning: Once we have a large number of containers or virtual machines, it is hard to manage and maintain them automatically. Cluster control tools provide a uniform operating environment on top of the containers and share the available capacity across multiple services. Apache Mesos and Kubernetes are examples of cluster control systems. Application lifecycle management: Application lifecycle management tools help to invoke applications when a new container is launched or kill the application when the container shuts down. Application lifecycle management allows to script application deployments and releases. It automatically detects failure scenarios and responds to them, thereby ensuring the availability of the application. This works in conjunction with the cluster control software. Marathon partially address this capability. Supporting capabilities Supporting capabilities are not directly linked to microservices, but these are essential for large-scale microservices development. Software-defined load balancer: The load balancer should be smart enough to understand the changes in deployment topology and respond accordingly. This moves away from the traditional approach of configuring static IP addresses, domain aliases, or cluster address in the load balancer. When new servers are added to the environment, it should automatically detect this and include them in the logical cluster by avoiding any manual interactions. Similarly, if a service instance is unavailable, it should take it out of the load balancer. A combination of Ribbon, Eureka, and Zuul provides this capability in Spring Cloud Netflix. Central log management: As explored earlier in this article, a capability is required to centralize all the logs emitted by service instances with correlation IDs. This helps debug, identify performances bottlenecks, and in predictive analysis. The result of this could feedback into the lifecycle manager to take corrective actions. Service registry: A service registry provides a runtime environment for services to automatically publish their availability at runtime. A registry will be a good source of information to understand the services topology at any point. Eureka from Spring Cloud, ZooKeeper, and Etcd are some of the service registry tools available. Security service: The distributed microservices ecosystem requires a central server to manage service security. This includes service authentication and token services. OAuth2-based services are widely used for microservices security. Spring Security and Spring Security OAuth are good candidates to build this capability. Service configuration: All service configurations should be externalized, as discussed in the Twelve-Factor application principles. A central service for all configurations could be a good choice. The Spring Cloud Config server and Archaius are out-of-the-box configuration servers. Testing tools (Anti-Fragile, RUM, and so on): Netflix uses Simian Army for antifragile testing. Mature services need consistent challenges to see the reliability of the services and how good fallback mechanisms are. Simian Army components create various error scenarios to explore the behavior of the system under failure scenarios. Monitoring and dashboards: Microservices also require a strong monitoring mechanism. This monitoring is not just at the infrastructurelevel but also at the service level. Spring Cloud Netflix Turbine, the Hysterix dashboard,and others provide service-level information. End-to-end monitoring tools,such as AppDynamic, NewRelic, Dynatrace, and other tools such as Statd, Sensu, and Spigo, could add value in microservices monitoring. Dependency and CI management: We also need tools to discover runtime topologies, to find service dependencies, and to manage configurable items (CIs). A graph-based CMDB is more obvious to manage these scenarios. Data lakes: As discussed earlier in this article, we need a mechanism to combine data stored in different microservices and perform near real-time analytics. Data lakesare a good choice to achieve this. Data ingestion tools such as Spring Cloud Data Flow, Flume, and Kafka are used to consume data. HDFS, Cassandra,and others are used to store data. Reliable messaging: If the communication is asynchronous, we may need a reliable messaging infrastructure service, such as RabbitMQ or any other reliable messaging service. Cloud messaging or messaging as service is a popular choice in Internet-scale message-based service endpoints. Process and governance capabilities The last in the puzzle are the process and governance capabilities required for microservices, which are: DevOps: Key in successful implementation is to adopt DevOps. DevOps complements microservices development by supporting agile development, high-velocity delivery, automation, and better change management. DevOps tools: DevOps tools for agile development, continuous integration, continuous delivery, and continuous deployment are essential for a successful delivery of microservices. A lot of emphasis is required in automated, functional, and real user testing as well as synthetic, integration, release, and performance testing. Microservices repository: A microservices repository is where the versioned binaries of microservices are placed. These could be a simple Nexus repository or container repositories such as the Docker registry. Microservice documentation: It is important to have all microservices properly documented. Swagger or API blueprint are helpful in achieving good microservices documentation. Reference architecture and libraries: Reference architecture provides a blueprint at the organization level to ensure that services are developed according to certain standards and guidelines in a consistent manner. Many of these could then be translated to a number of reusable libraries that enforce service development philosophies. Summary In this article,you learned the concepts and characteristics of microservices. We took as example a holiday portal to understand the concept of microservices better. We also examined some of the common challenges in large-scale microservice implementation. Finally, we established a microservices capability model in this article that can be used to deliver successful Internet-scale microservices.
Read more
  • 0
  • 0
  • 8040
article-image-microservices-brave-new-world
Packt
17 Mar 2016
9 min read
Save for later

Microservices – Brave New World

Packt
17 Mar 2016
9 min read
In this article by David Gonzalez, author of the book Developing Microservices with Node.js, we will cover the need for microservices, explain the monolithic approach, and study how to build and deploy microservices. (For more resources related to this topic, see here.) Need for microservices The world of software development has evolved quickly over the past 40 years. One of the key points of this evolution has been the size of these systems. From the days of MS-DOS, we taken a hundred-fold leap into our present systems. This growth in size creates a need for better ways of organizing the code and software components. Usually, when a company grows due to business needs, which is known as organic growth, the software gets organized on a monolithic architecture as it is the easiest and quickest way of building software. After few years (or even months), adding new features becomes harder due to the coupled nature of the created software. Monolithic software There are a few companies that have already started building their software using microservices, which is the ideal scenario. The problem is that not all the companies can plan their software upfront. Instead of planning, these companies build the software based on the organic growth experienced: few software components that group business flows by affinity. It is not rare to see companies having two big software components: the user facing website and the internal administration tools. This is usually known as a monolithic software architecture. Some of these companies face big problems when trying to scale the engineering teams. It is hard to coordinate the teams that build, deploy, and maintain a single software component. Clashes on releases and reintroduction of bugs are a common problem that drains a big chunk of energy from the teams. One of the solution to this problem (it also has other benefits) is to split the monolithic software into microservices so that the teams are able to specialize in few smaller modules and autonomous and isolated software components that can be versioned, updated, and deployed without interfering with the rest of the systems of the company. One of the most interesting solutions to this problem is splitting the monolithic architecture into microservices. This enables the engineering team to create isolated and autonomous units of work that are highly specialized in a given task (such as sending e-mails, processing card payment, and so on). Microservices in the real world Microservices are small software components that specialize in one task and work together to achieve a higher-level task. Forget about software for a second and think about how a company works. When someone applies for a job in a company, he applies for a given position: software engineer, systems administrator, or office manager The reason for it can be summarized in one word—specialization. If you are used to working as a software engineer, you will get better with the experience and add more value to the company. The fact that you don’t know how to deal with a customer, won’t affect your performance as it is not your area of expertise and will hardly add any value to your day-to-day work. A microservice is an autonomous unit of work that can execute one task without interfering with other parts of the system, similar to what a job position is to a company. This has a number of benefits that can be used in favor of the engineering team in order to help to scale the systems of a company. Nowadays, hundreds of systems are built using a microservices-oriented architectures, as follows: Netflix: They are one of the most popular streaming services and have built an entire ecosystem of applications that collaborate in order to provide a reliable and scalable streaming system used across the globe. Spotify: They are one of the leading music streaming services in the world and have built this application using microservices. Every single widget of the application (which is a website exposed as a desktop app using Chromium Embedded Framework (CEF)) is a different microservice that can be updated individually. First, there was the monolith A huge percentage (my estimate is around 90%) of the modern enterprise software is built following a monolithic approach. Huge software components that run in a single container and have a well-defined development life cycle that goes completely against the following agile principles, deliver early and deliver often (https://en.wikipedia.org/wiki/Release_early,_release_often): Deliver early: The sooner you fail, the easier it is to recover. If you are working for two years in a software component and then, it is released, there is a huge risk of deviation from the original requirements, which are usually wrong and changing every few days. Deliver often: Everything of the software is delivered to all the stake holders so that they can have their inputs and see the changes reflected in the software. Errors can be fixed in a few days and improvements are identified easily. Companies build big software components instead of smaller ones that work together as it is the natural thing to do, as follows: The developer has a new requirement. He builds a new method on an existing class on the service layer. The method is exposed on the API via HTTP, SOAP, or any other protocol. Now, repeat it by the number of developers in your company and you will obtain something called organic growth. Organic growth is the type of uncontrolled and unplanned growth on software systems under business pressure without an adequate long-term planning, and it is bad. How to tackle the organic growth? The first thing needed to tackle the organic growth is make sure that business and IT are aligned in the company. Usually, in big companies, IT is not seen as a core part of the business. Organizations outsource their IT systems, keeping the cost in mind, but not the quality so that the partners building these software components are focused on one thing: deliver on time and according to the specification, even if it is incorrect. This produces a less-than-ideal ecosystem to respond to the business needs with a working solution for an existing problem. IT is lead by people who barely understand how the systems are built and usually overlook the complexity of the software development. Fortunately, this is a changing tendency as IT systems have become the drivers of 99% of the businesses around the world, but we need to be smarter about how we build them. The first measure to tackle the organic growth is to align IT and business stakeholders in order to work together, educating the non-technical stakeholders is the key to success. If we go back to the example from the previous section (few releases with quite big changes). Can we do it better? Of course, we can. Divide the work into manageable software artifacts that model a single and well-defined business activity and give it an entity. It does not need to be a microservice at this stage, but keeping the logic inside a separated, well-defined, easy testable, and decoupled module will give us a huge advantage towards future changes in the application. Building microservices – The fallback strategy When you design a system, we usually think about the replaceability of the existing components. For example, when using a persistence technology in Java, we tend to lean towards the standards (Java Persistence API (JPA)) so that we can replace the underneath implementation without too much effort. Microservices take the same approach, but they isolate the problem instead of working towards an easy replaceability. Also, e-mailing is something that, although it seems simple, always ends up giving problems. Consider that we want to replace Mandrill with a plain SMTP server, such as Gmail. We don't need to do anything special, we just change the implementation and rollout the new version of our microservice, as follows: var nodemailer = require('nodemailer'); var seneca = require("seneca")(); var transporter = nodemailer.createTransport({ service: 'Gmail', auth: { user: 'info@micromerce.com', pass: 'verysecurepassword' } }); /** * Sends an email including the content. */ seneca.add({area: "email", action: "send"}, function(args, done) { var mailOptions = { from: 'Micromerce Info ✔ <info@micromerce.com>', to: args.to, subject: args.subject, html: args.body }; transporter.sendMail(mailOptions, function(error, info){ if(error){ done({code: e}, null); } done(null, {status: "sent"}); }); }); For the outer world, our simplest version of the e-mail sender is now at all lights, using SMTP through Gmail to deliver our e-mails. We could even rollout one server with this version and send some traffic to it in order to validate our implementation without affecting all the customers (in other words, contain the failure). Deploying microservices Deployment is usually the ugly friend of the software development life cycle party. There is a missing contact point in between development and system administration, which DevOps is going to solve in the following few years (or has already done it and no one told me). The following is the graph showing the cost of fixing software bugs versus the various phases of development: From the continuous integration up to continuous delivery, the process should be automated as much as possible, where as much as possible means 100%. Remember, humans are imperfect…if we rely on humans carrying on a manual repetitive process for a bug-free software, we are walking the wrong path. Remember that a machine will always be error free (as long as the algorithm that is executed is error free) so…why not let a machine control our infrastructure? Summary In this article, we saw how microservices are required in complex software systems, how the monolithic approach is useful, and how to build and deploy microservices. Resources for Article: Further resources on this subject: Making a Web Server in Node.js [article] Node.js Fundamentals and Asynchronous JavaScript [article] An Introduction to Node.js Design Patterns [article]
Read more
  • 0
  • 0
  • 1879

article-image-how-to-build-12-factor-design-microservices-on-docker-part-2
Cody A.
29 Jun 2015
14 min read
Save for later

How to Build 12 Factor Microservices on Docker - Part 2

Cody A.
29 Jun 2015
14 min read
Welcome back to our how-to on Building and Running 12 Factor Microservices on Docker. In Part 1, we introduced a very simple python flask application which displayed a list of users from a relational database. Then we walked through the first four of these factors, reworking the example application to follow these guidelines. In Part 2, we'll be introducing a multi-container Docker setup as the execution environment for our application. We’ll continue from where we left off with the next factor, number five. Build, Release, Run. A 12-factor app strictly separates the process for transforming a codebase into a deploy into distinct build, release, and run stages. The build stage creates an executable bundle from a code repo, including vendoring dependencies and compiling binaries and asset packages. The release stage combines the executable bundle created in the build with the deploy’s current config. Releases are immutable and form an append-only ledger; consequently, each release must have a unique release ID. The run stage runs the app in the execution environment by launching the app’s processes against the release. This is where your operations meet your development and where a PaaS can really shine. For now, we’re assuming that we’ll be using a Docker-based containerized deploy strategy. We’ll start by writing a simple Dockerfile. The Dockerfile starts with an ubuntu base image and then I add myself as the maintainer of this app. FROM ubuntu:14.04.2 MAINTAINER codyaray Before installing anything, let’s make sure that apt has the latest versions of all the packages. RUN echo "deb http://archive.ubuntu.com/ubuntu/ $(lsb_release -sc) main universe" >> /etc/apt/sources.list RUN apt-get update Install some basic tools and the requirements for running a python webapp RUN apt-get install -y tar curl wget dialog net-tools build-essential RUN apt-get install -y python python-dev python-distribute python-pip RUN apt-get install -y libmysqlclient-dev Copy over the application to the container. ADD /. /src Install the dependencies. RUN pip install -r /src/requirements.txt Finally, set the current working directory, expose the port, and set the default command. EXPOSE 5000 WORKDIR /src CMD python app.py Now, the build phase consists of building a docker image. You can build and store locally with docker build -t codyaray/12factor:0.1.0 . If you look at your local repository, you should see the new image present. $ docker images REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE codyaray/12factor 0.1.0 bfb61d2bbb17 1 hour ago 454.8 MB The release phase really depends on details of the execution environment. You’ll notice that none of the configuration is stored in the image produced from the build stage; however, we need a way to build a versioned release with the full configuration as well. Ideally, the execution environment would be responsible for creating releases from the source code and configuration specific to that environment. However, if we’re working from first principles with Docker rather than a full-featured PaaS, one possibility is to build a new docker image using the one we just built as a base. Each environment would have its own set of configuration parameters and thus its own Dockerfile. It could be something as simple as FROM codyaray/12factor:0.1.0 MAINTAINER codyaray ENV DATABASE_URL mysql://sa:mypwd@mydbinstance.abcdefghijkl.us-west-2.rds.amazonaws.com/mydb This is simple enough to be programmatically generated given the environment-specific configuration and the new container version to be deployed. For the demonstration purposes, though, we’ll call the above file Dockerfile-release so it doesn’t conflict with the main application’s Dockerfile. Then we can build it with docker build -f Dockerfile-release -t codyaray/12factor-release:0.1.0.0 . The resulting built image could be stored in the environment’s registry as codyaray/12factor-release:0.1.0.0. The images in this registry would serve as the immutable ledger of releases. Notice that the version has been extended to include a fourth level which, in this instance, could represent configuration version “0” applied to source version “0.1.0”. The key here is that these configuration parameters aren’t collated into named groups (sometimes called “environments”). For example, these aren’t static files named like Dockerfile.staging or Dockerfile.dev in a centralized repo. Rather, the set of parameters is distributed so that each environment maintains its own environment mapping in some fashion. The deployment system would be setup such that a new release to the environment automatically applies the environment variables it has stored to create a new Docker image. As always, the final deploy stage depends on whether you’re using a cluster manager, scheduler, etc. If you’re using standalone Docker, then it would boil down to docker run -P -t codyaray/12factor-release:0.1.0.0 Processes. A 12-factor app is executed as one or more stateless processes which share nothing and are horizontally partitionable. All data which needs to be stored must use a stateful backing service, usually a database. This means no sticky sessions and no in-memory or local disk-based caches. These processes should never daemonize or write their own PID files; rather, they should rely on the execution environment’s process manager (such as Upstart). This factor must be considered up-front, in line with the discussions on antifragility, horizontal scaling, and overall application design. As the example app delegates all stateful persistence to a database, we’ve already succeeded on this point. However, it is good to note that a number of issues have been found using the standard ubuntu base image for Docker, one of which is its process management (or lack thereof). If you would like to use a process manager to automatically restart crashed daemons, or to notify a service registry or operations team, check out baseimage-docker. This image adds runit for process supervision and management, amongst other improvements to base ubuntu for use in Docker such as obsoleting the need for pid files. To use this new image, we have to update the Dockerfile to set the new base image and use its init system instead of running our application as the root process in the container. FROM phusion/baseimage:0.9.16 MAINTAINER codyaray RUN echo "deb http://archive.ubuntu.com/ubuntu/ $(lsb_release -sc) main universe" >> /etc/apt/sources.list RUN apt-get update RUN apt-get install -y tar git curl nano wget dialog net-tools build-essential RUN apt-get install -y python python-dev python-distribute python-pip RUN apt-get install -y libmysqlclient-dev ADD /. /src RUN pip install -r /src/requirements.txt EXPOSE 5000 WORKDIR /src RUN mkdir /etc/service/12factor ADD 12factor.sh /etc/service/12factor/run # Use baseimage-docker's init system. CMD ["/sbin/my_init"]  Notice the file 12factor.sh that we’re now adding to /etc/service. This is how we instruct runit to run our application as a service. Let’s add the new 12factor.sh file. #!/bin/sh python /src/app.py Now the new containers we deploy will attempt to be a little more fault-tolerant by using an OS-level process manager. Port Binding. A 12-factor app must be self-contained and bind to a port specified as an environment variable. It can’t rely on the injection of a web container such as tomcat or unicorn; instead it must embed a server such as jetty or thin. The execution environment is responsible for routing requests from a public-facing hostname to the port-bound web process. This is trivial with most embedded web servers. If you’re currently using an external web server, this may require more effort to support an embedded server within your application. For the example python app (which uses the built-in flask web server), it boils down to port = int(os.environ.get("PORT", 5000)) app.run(host='0.0.0.0', port=port) Now the execution environment is free to instruct the application to listen on whatever port is available. This obviates the need for the application to tell the environment what ports must be exposed, as we’ve been required to do with Docker. Concurrency. Because a 12-factor exclusively uses stateless processes, it can scale out by adding processes. A 12-factor app can have multiple process types, such as web processes, background worker processes, or clock processes (for cron-like scheduled jobs). As each process type is scaled independently, each logical process would become its own Docker container as well. We’ve already seen building a web process; other processes are very similar. In most cases, scaling out simply means launching more instances of the container. (Its usually not desirable to scale out the clock processes, though, as they often generate events that you want to be scheduled singletons within your infrastructure.) Disposability. A 12-factor app’s processes can be started or stopped (with a SIGTERM) anytime. Thus, minimizing startup time and gracefully shutting down is very important. For example, when a web service receives a SIGTERM, it should stop listening on the HTTP port, allow in-flight requests to finish, and then exit. Similar, processes should be robust against sudden death; for example, worker processes should use a robust queuing backend. You want to ensure the web server you select can gracefully shutdown. The is one of the trickier parts of selecting a web server, at least for many of the common python http servers that I’ve tried.  In theory, shutting down based on receiving a SIGTERM should be as simple as follows. import signal signal.signal(signal.SIGTERM, lambda *args: server.stop(timeout=60)) But often times, you’ll find that this will immediately kill the in-flight requests as well as closing the listening socket. You’ll want to test this thoroughly if dependable graceful shutdown is critical to your application. Dev/Prod Parity. A 12-factor app is designed to keep the gap between development and production small. Continuous deployment shrinks the amount of time that code lives in development but not production. A self-serve platform allows developers to deploy their own code in production, just like they do in their local development environments. Using the same backing services (databases, caches, queues, etc) in development as production reduces the number of subtle bugs that arise in inconsistencies between technologies or integrations. As we’re deploying this solution using fully Dockerized containers and third-party backing services, we’ve effectively achieved dev/prod parity. For local development, I use boot2docker on my Mac which provides a Docker-compatible VM to host my containers. Using boot2docker, you can start the VM and setup all the env variables automatically with boot2docker up $(boot2docker shellinit) Once you’ve initialized this VM and set the DOCKER_HOST variable to its IP address with shellinit, the docker commands given above work exactly the same for development as they do for production. Logs. Consider logs as a stream of time-ordered events collected from all running processes and backing services. A 12-factor app doesn’t concern itself with how its output is handled. Instead, it just writes its output to its `stdout` stream. The execution environment is responsible for collecting, collating, and routing this output to its final destination(s). Most logging frameworks either support logging to stderr/stdout by default or easily switching from file-based logging to one of these streams. In a 12-factor app, the execution environment is expected to capture these streams and handle them however the platform dictates. Because our app doesn’t have specific logging yet, and the only logs are from flask and already to stderr, we don’t have any application changes to make.  However, we can show how an execution environment which could be used handle the logs. We’ll setup a Docker container which collects the logs from all the other docker containers on the same host. Ideally, this would then forward the logs to a centralized service such as Elasticsearch. Here we’ll demo using Fluentd to capture and collect the logs inside the log collection container; a simple configuration change would allow us to switch from writing these logs to disk as we demo here and instead send them from Fluentd to a local Elasticsearch cluster. We’ll create a Dockerfile for our new logcollector container type. For more detail, you can find a Docker fluent tutorial here. We can call this file Dockerfile-logcollector. FROM kiyoto/fluentd:0.10.56-2.1.1 MAINTAINER kiyoto@treasure-data.com RUN mkdir /etc/fluent ADD fluent.conf /etc/fluent/ CMD "/usr/local/bin/fluentd -c /etc/fluent/fluent.conf" We use an existing fluentd base image with a specific fluentd configuration. Notably this tails all the log files in /var/lib/docker/containers/<container-id>/<container-id>-json.log, adds the container ID to the log message, and then writes to JSON-formatted files inside /var/log/docker. <source> type tail path /var/lib/docker/containers/*/*-json.log pos_file /var/log/fluentd-docker.pos time_format %Y-%m-%dT%H:%M:%S tag docker.* format json </source> <match docker.var.lib.docker.containers.*.*.log> type record_reformer container_id ${tag_parts[5]} tag docker.all </match> <match docker.all> type file path /var/log/docker/*.log format json include_time_key true </match> As usual, we create a Docker image. Don’t forget to specify the logcollector Dockerfile. docker build -f Dockerfile-logcollector -t codyaray/docker-fluentd . We’ll need to mount two directories from the Docker host into this container when we launch it. Specifically, we’ll mount the directory containing the logs from all the other containers as well as the directory to which we’ll be writing the consolidated JSON logs. docker run -d -v /var/lib/docker/containers:/var/lib/docker/containers -v /var/log/docker:/var/log/docker codyaray/docker-fluentd Now if you check in the /var/log/docker directory, you’ll see the collated JSON log files. Note that this is on the docker host rather than in any container; if you’re using boot2docker, you can ssh into the docker host with boot2docker ssh and then check /var/log/docker. Admin Processes. Any admin or management tasks for a 12-factor app should be run as one-off processes within a deploy’s execution environment. This process runs against a release using the same codebase and configs as any process in that release and uses the same dependency isolation techniques as the long-running processes. This is really a feature of your app's execution environment. If you’re running a Docker-like containerized solution, this may be pretty trivial. docker run -i -t --entrypoint /bin/bash codyaray/12factor-release:0.1.0.0 The -i flag instructs docker to provide interactive session, that is, to keep the input and output ttys attached. Then we instruct docker to run the /bin/bash command instead of another 12factor app instance. This creates a new container based on the same docker image, which means we have access to all the code and configs for this release. This will drop us into a bash terminal to do whatever we want. But let’s say we want to add a new “friends” table to our database, so we wrote a migration script add_friends_table.py. We could run it as follows: docker run -i -t --entrypoint python codyaray/12factor-release:0.1.0.0 /src/add_friends_table.py As you can see, following the few simple rules specified in the 12 Factor manifesto really allows your execution environment to manage and scale your application. While this may not be the most feature-rich integration within a PaaS, it is certainly very portable with a clean separation of responsibilities between your app and its environment. Much of the tools and integration demonstrated here were a do-it-yourself container approach to the environment, which would be subsumed by an external vertically integrated PaaS such as Deis. If you’re not familiar with Deis, its one of several competitors in the open source platform-as-a-service space which allows you to run your own PaaS on a public or private cloud. Like many, Deis is inspired by Heroku. So instead of Dockerfiles, Deis uses a buildpack to transform a code repository into an executable image and a Procfile to specify an app’s processes. Finally, by default you can use a specialized git receiver to complete a deploy. Instead of having to manage separate build, release, and deploy stages yourself like we described above, deploying an app to Deis could be a simple as git push deis-prod While it can’t get much easier than this, you’re certainly trading control for simplicity. It's up to you to determine which works best for your business. Find more Docker tutorials alongside our latest releases on our dedicated Docker page. About the Author Cody A. Ray is an inquisitive, tech-savvy, entrepreneurially-spirited dude. Currently, he is a software engineer at Signal, an amazing startup in downtown Chicago, where he gets to work with a dream team that’s changing the service model underlying the Internet.
Read more
  • 0
  • 1
  • 6856