Virtualization | 115 articles | Tech News, Tutorials & Expert Insights

article-image-new-for-2020-in-operations-and-infrastructure-engineering

19 Dec 2019

5 min read

New for 2020 in operations and infrastructure engineering

19 Dec 2019

It’s an exciting time if you work in operations and software infrastructure. Indeed, you could even say that as the pace of change and innovation increases, your role only becomes more important. Operations and systems engineers, solution architects, everyone - you’re jobs are all about bringing stability, order and control into what can sometimes feel like chaos. As anyone that’s been working in the industry knows, managing change, from a personal perspective, requires a lot of effort. To keep on top of what’s happening in the industry - what tools are being released and updated, what approaches are gaining traction - you need to have one eye on the future and the wider industry. To help you with that challenge and get you ready for 2020, we’ve put together a list of what’s new for 2020 - and what you should start learning. Learn how to make Kubernetes work for you It goes without saying that Kubernetes was huge in 2019. But there are plenty of murmurs and grumblings that it’s too complicated and adds an additional burden for engineering and operations teams. To a certain extent there’s some truth in this - and arguably now would be a good time to accept that just because it seems like everyone is using Kubernetes, it doesn’t mean it’s the right solution for you. However, having said that, 2020 will be all about understanding how to make Kubernetes relevant to you. This doesn’t mean you should just drop the way you work and start using Kubernetes, but it does mean that spending some time with the platform and getting a better sense of how it could be used in the future is a useful way to spend your learning time in 2020. Explore Packt's extensive range of Kubernetes eBooks and videos on the Packt store. Learn how to architect If software has eaten the world, then by the same token perhaps complexity has well and truly eaten software as we know it. Indeed, Kubernetes is arguably just one of the symptoms and causes of this complexity. Another is the growing demand for architects in engineering and IT teams. There are a number of different ‘architecture’ job roles circulating across the industry, from solutions architect to application architect. While they each have their own subtle differences, and will even vary from company to company, they’re all roles that are about organizing and managing different pieces into something that is both stable and value-driving. Cloud has been particularly instrumental in making architect roles more prominent in the industry. As organizations look to resist the pitfalls of lock-in and better manage resources (financial and otherwise), it will be down to architects to balance business and technology concerns carefully. Learn how to architect cloud native applications. Read Architecting Cloud Computing Solutions. Get to grips with everything you need to know to be a software architect. Pick up Software Architect's Handbook. Artificial intelligence It’s strange that the hype around AI doesn’t seem to have reached the world of ops. Perhaps this is because the area is more resistant to the spin that comes with AI, preferring instead to focus more on the technical capabilities of tools and platforms. Whatever the case, it’s nevertheless true that AI will play an important part in how we manage and secure infrastructure. From monitoring system health, to automating infrastructure deployments and configuration, and even identifying security threats, artificial intelligence is already an important component for operations engineers and others. Indeed, artificial intelligence is being embedded inside products and platforms that ops teams are using - this means the need to ‘learn’ artificial intelligence is somewhat reduced. But it would be wrong to think it’s something that can just be managed from a dashboard. In 2020 it will be essential to better understand where and how artificial intelligence can fit into your operations and architectural toolchain. Find artificial intelligence eBooks and videos in Packt's collection of curated data science bundles. Observability, monitoring, tracing, and logging One of the challenges of software complexity is understanding exactly what’s going on under the hood. Yes, the network might be unreliable, as the saying goes, but what makes things even worse is that we’re not even sure why. This is where observability and the next generation of monitoring, logging and tracing all come into play. Having detailed insights into how applications and infrastructures are performing, how resources are being managed, and what things are actually causing problems is vitally important from a team perspective. Without the ability to understand these things, it can put pressure on teams as knowledge becomes siloed inside the brains of specific engineers. It makes you vulnerable to failure as you start to have points of failure at a personnel level. There are, of course, a wide range of tools and products available that can make monitoring and tracing easy (or easier, at least). But understanding which ones are right for your needs still requires some time learning and exploring the options out there. Make sure you do exactly that in 2020. Learn how to monitor distributed systems with Learn Centralized Logging and Monitoring with Kubernetes. Making serverless a reality We’ve talked about serverless a lot this year. But as a concept there’s still considerable confusion about what role it should play in modern DevOps processes. Indeed, even the nomenclature is a little confusing. Platforms using their own terminology, such as ‘lambdas’ and ‘functions’, only adds to the sense that serverless is something amorphous and hard to pin down. So, in 2020, we need to work out how to make serverless work for us. Just as we need to consider how Kubernetes might be relevant to our needs, we need to consider in what ways serverless represents both a technical and business opportunity. Search Packt's library for the latest serverless eBooks and videos. Explore more technology eBooks and videos on the Packt store.

0
0
4617

article-image-chaos-engineering-comes-to-kubernetes-thanks-to-gremlin

Richard Gall

18 Nov 2019

2 min read

Chaos engineering comes to Kubernetes thanks to Gremlin

Richard Gall

18 Nov 2019

2 min read

Kubernetes causes problems. Just last week Cindy Sridharan wrote on Twitter that while Docker "succeeded... because it was a great developer tool," Kubernetes "decided to be all things tech and not much by way of UX. It was and remains a hostile piece of software to learn, run, operate, maintain." https://twitter.com/copyconstruct/status/1194701905248673792?s=20 That's just a drop in the ocean - you don't have to look hard to find more hot takes, jokes, and memes about how complicated working with Kubernetes can feel. Despite all this, it's certainly here to stay. That makes chaos engineering platform Gremlin's announcement that the platform will offer native support for Kubernetes particularly welcome. Citing container orchestration research done by Datadog in the press release, which indicates the rapid rate of Kubernetes adoption, Gremlin is hoping that it can provide some additional support for users that might be concerned about the platforms complexity. From last year: Gremlin makes chaos engineering with Docker easier with new container discovery feature Gremlin CTO Matt Fornaciari said "our goal is to provide SRE and DevOps teams that are building and deploying modern applications with the tools and processes necessary to understand how their systems handle failure, before that failure has the chance to impact customers and business." The new feature is designed to help engineers do exactly that by allowing them "to automate the process of identifying Kubernetes primitives such as nodes and pods," and to select and attack traffic from different services within Kubernetes. The other important element to all this is that Gremlin wants to make things as straightforward as possible for engineering teams. With a neat and easy to use UI, it would seem that, to return to Sridharan's words, the team are eager to make sure their product is "a great developer tool." The tool has already been tried and tested in the wild. Simon Govier, Expedia's Director of Program Management described how performing chaos experiments on Kubernetes with Gremlin "significantly reduces the amount of time it takes to do fault injection and increases our systems' resilience to failure." Learn more on the Gremlin website.

0
0
4054

article-image-6-signs-you-need-containers

Richard Gall

05 Feb 2019

9 min read

6 signs you need containers

Richard Gall

05 Feb 2019

9 min read

I’m not about to tell you containers is a hot new trend - clearly, it isn’t. Today, they are an important part of the mainstream software development industry that probably won't be disappearing any time soon. But while containers certainly can’t be described as a niche or marginal way of deploying applications, they aren’t necessarily ubiquitous. There are still developers or development teams yet to fully appreciate the usefulness of containers. You might know them - you might even be one of them. Joking aside, there are often many reasons why people aren’t using containers. Sometimes these are good reasons: maybe you just don’t need them. Often, however, you do need them, but the mere thought of changing your systems and workflow can feel like more trouble than it’s worth. If everything seems to be (just about) working, why shake things up? Well, I’m here to tell you that more often than not it is worthwhile. But to know that you’re not wasting your time and energy, there are a few important signs that can tell you if you should be using containers. Download Containerize Your Apps with Docker and Kubernetes for free, courtesy of Microsoft. Your codebase is too complex There are few developers in the world who would tell you that their codebase couldn’t do with a little pruning and simplification. But if your code has grown into a beast that everyone fears and doesn’t really understand, containers could probably help you a lot. Why do containers help simplify your codebase? Let’s think about how spaghetti code actually happens. Yes, it always happens by accident, but usually it’s something that evolves out of years of solving intractable problems with knock on effects and consequences that only need to be solved later. By using containers you can begin to think differently about your code. Instead of everything being tied up together, like a complex concrete network of road junctions, containers allow you to isolate specific parts of it. When you can better isolate your code, you can also isolate different problems and domains. This is one of the reasons that containers is so closely aligned with microservices. Software testing is nightmarish The efficiency benefits of containers are well documented, but the way containers can help the software testing process is often underplayed - this probably says more about a general inability to treat testing with the respect and time it deserves as much as anything else. How do containers make testing easier? There are a number of reasons containers make software testing easier. On the one hand, by using containers you’re reducing that gap between the development environment and production, which means you shouldn’t be faced with as many surprises once your code hits production as you sometimes might. Containers also make the testing process faster - you only need to test against a container image, you don’t need a fully-fledged testing environment for every application you do tests on. What this all boils down to is that testing becomes much quicker and easier. In theory, then, this means the testing process fits much more neatly within the development workflow. Code quality should never be seen as a bottleneck; with containers it becomes much easier to embed the principle in your workflow. Read next: How to build 12 factor microservices on Docker Your software isn’t secure - you’ve had breaches that could have been prevented Spaghetti code, lack of effective testing can lead to major security risks. If no one really knows what’s going on inside your applications and inside your code it’s inevitable that you’ll have vulnerabilities. And, in turn, it’s highly likely these vulnerabilities will be exploited. How can containers make software security easier? Because containers allow you to make changes to parts of your software infrastructure (rather than requiring wholesale changes), this makes security patches much easier to achieve. Essentially, you can isolate the problem and tackle it. Without containers, it becomes harder to isolate specific pieces of your infrastructure, which means any changes could have a knock on effect on other parts of your code that you can’t predict. That all being said, it probably is worth mentioning that containers do still pose a significant set of security challenges. While simplicity in your codebase can make testing easier, you are replacing simplicity at that level with increased architectural complexity. To really feel the benefits of container security, you need a strong sense of how your container deployments are working together and how they might interact. Your software infrastructure is expensive (you feel the despair of vendor lock-in) Running multiple virtual machines can quickly get expensive. In terms of both storage and memory, if you want to scale up, you’re going to be running through resources at a rapid rate. While you might end up spending big on more traditional compute resources, the tools around container management and automation are getting cheaper. One of the costs of many organization’s software infrastructure is lock-in. This isn’t just about price, it’s about the restrictions that come with sticking with a certain software vendor - you’re spending money on software systems that are almost literally restricting your capacity for growth and change. How do containers solve the software infrastructure problem and reduce vendor lock-in? Traditional software infrastructure - whether that’s on-premise servers or virtual ones - is a fixed cost - you invest in the resources you need, and then either use it or you don’t. With containers running on, say, cloud, it becomes a lot easier to manage your software spend alongside strategic decisions about scalability. Fundamentally, it means you can avoid vendor lock-in. Yes, you might still be paying a lot of money for AWS or Azure, but because containers are much more portable, moving your applications between providers is much less hassle and risk. Read next: CNCF releases 9 security best practices for Kubernetes, to protect a customer’s infrastructure DevOps is a war, not a way of working Like containers, DevOps could hardly be considered a hot new trend any more. But this doesn’t mean it’s now part of the norm. There are plenty of organizations that simply don’t get DevOps, or, at the very least, seem to be stumbling their way through sprint meetings with little real alignment between development and operations. There could be multiple causes for this conflict (maybe people just don’t get on), but DevOps often fails where the code that’s being written and deployed is too complicated for anyone to properly take accountability. This takes us back to the issue of the complex codebase. Think of it this way - if code is a gigantic behemoth that can’t be easily broken up, the unintended effects and consequences of every new release and update can cause some big problems - both personally and technically. How do containers solve DevOps challenges? Containers can help solve the problems that DevOps aims to tackle by breaking up software into different pieces. This means that developers and operations teams have much more clarity on what code is being written and why, as well as what it should do. Indeed, containers arguably facilitate DevOps practices much more effectively than DevOps proponents have been trying to do in pre-container years. Adding new product features is a pain The issue of adding features or improving applications is a complaint that reaches far beyond the development team. Product management, marketing - these departments will all bemoan the ability to make necessary changes or add new features that they will argue is business critical. Often, developers will take the heat. But traditional monolithic applications make life difficult for developers - you simply can’t make changes or updates. It’s like wanting to replace a radiator and having to redo your house’s plumbing. This actually returns us to the earlier point about DevOps - containers makes DevOps easier because it enables faster delivery cycles. You can make changes to an application at the level of a container or set of containers. Indeed, you might even simply kill one container and replace it with a new one. In turn, this means you can change and build things much more quickly. How do containers make it easier to update or build new features? To continue with the radiator analogy: containers would allow you to replace or change an individual radiator without having to gut your home. Essentially, if you want to add a new feature or change an element, you wouldn’t need to go into your application and make wholesale changes - that may have unintended consequences - instead, you can simply make a change by running the resources you need inside a new container (or set of containers). Watch for the warning signs As with any technology decision, it’s well worth paying careful attention to your own needs and demands. So, before fully committing to containers, or containerizing an application, keep a close eye on the signs that they could be a valuable option. Containers may well force you to come face to face with the reality of technical debt - and if it does, so be it. There’s no time like the present, after all. Of course, all of the problems listed above are ultimately symptoms of broader issues or challenges you face as a development team or wider organization. Containers shouldn’t be seen as a sure-fire corrective, but they can be an important element in changing your culture and processes. Learn how to containerize your apps with a new eBook, free courtesy of Microsoft. Download it here.

0
0
5503

article-image-its-black-friday-but-whats-the-business-and-developer-cost-of-downtime

Richard Gall

23 Nov 2018

4 min read

It's Black Friday: But what's the business (and developer) cost of downtime?

Richard Gall

23 Nov 2018

4 min read

Black Friday is back, and, as you've probably already noticed, with a considerable vengeance. According to Adobe Analytics data, online spending is predicted to hit $3.7 billion over this holiday season in the U.S, up from $2.9 billion in 2017. But while consumers clamour for deals and businesses reap the rewards, it's important to remember there's a largely hidden plane of software engineering labour. Without this army of developers, consumers will most likely be hitting their devices in frustration, while business leaders will be missing tough revenue targets - so, as we enter into Black Friday let's pour one out for all those engineers on call and trying their best to keep eCommerce sites on their feet. Here's to the software engineers keeping things running on Black Friday Of course, the pain that hits on days like Black Friday and Cyber Monday can be minimised with smart planning and effective decision making long before those sales begin. However, for engineering teams under-resourced and lacking the right tools, that is simply impossible. This means that software engineers are left in a position where they're treading water, knowing that they're going to be sinking once those big days come around. It doesn't have to be like this. With smarter leadership and, indeed, more respect for the intensive work engineers put in to make websites and apps actually work, revenue driving platforms can become more secure, resilient and stable. Chaos engineering platform Gremlin publishes the 'true cost of downtime' This is the central argument of chaos engineering platform Gremlin, who we've covered a number of times this year. To coincide with Black Friday the team has put together what they believe is the 'true cost of downtime'. On the one hand this is a good marketing hook for their chaos engineering platform, but, cynicism aside, it's also a good explanation of why the principles of chaos engineering can be so valuable from both a business and developer perspective. Estimating the annual revenue of some of the biggest companies in the world, Gremlin has been then created an interactive table to demonstrate what the cost of downtime for each of those businesses would be, for the length of time you are on the page. For 20 minutes downtime, Amazon.com would have lost a staggering $4.4 million. For Walgreens it's more than $80,000. Gremlin provide some context to all this, saying: "Enterprise commerce businesses typically rely on a complex microservices architecture, from fulfillment, to website security, ability to scale with holiday traffic, and payment processing - there is a lot that can go wrong and impact revenue, damage customer trust, and consume engineering time. If an ecommerce site isn’t 100% online and performant, it’s losing revenue." "The holiday season is especially demanding for SREs working in ecommerce. Even the most skilled engineering teams can struggle to keep up with the demands of peak holiday traffic (i.e. Black Friday and Cyber Monday). Just going down for a few seconds can mean thousands in lost revenue, but for some sites, downtime can be exponentially more expensive." For Gremlin, chaos engineering is clearly the answer to many of the problems days like Black Friday poses. While it might not work for every single organization, it's nevertheless true that failing to pay attention to the value of your applications and websites at an hour by hour level could be incredibly damaging. With outages on Facebook, WhatsApp, and Instagram happening earlier this week, these problems aren't hidden away - they're in full view of the public. What does remain hidden, however, is the work and stress that goes in to tackling these issues and ensuring things are working as they should be. Perhaps it's time to start learning the lessons of Black Friday - business revenues will be that little bit healthier, but engineers will also be that little bit happier.

0
0
4972

article-image-chaos-conf-2018-recap-chaos-engineering-hits-maturity-as-community-moves-towards-controlled-experimentation

Richard Gall

12 Oct 2018

11 min read

Chaos Conf 2018 Recap: Chaos engineering hits maturity as community moves towards controlled experimentation

Richard Gall

12 Oct 2018

11 min read

Conferences can sometimes be confusing. Even at the most professional and well-planned conferences, you sometimes just take a minute and think what's actually the point of this? Am I learning anything? Am I meant to be networking? Will anyone notice if I steal extra food for the journey home? Chaos Conf 2018 was different, however. It had a clear purpose: to take the first step in properly forging a chaos engineering community. After almost a decade somewhat hidden in the corners of particularly innovative teams at Netflix and Amazon, chaos engineering might feel that its time has come. As software infrastructure becomes more complex, less monolithic, and as business and consumer demands expect more of the software systems that have become integral to the very functioning of life, resiliency has never been more important but more challenging to achieve. But while it feels like the right time for chaos engineering, it hasn't quite established itself in the mainstream. This is something the conference host, Gremlin, a platform that offers chaos engineering as a service, is acutely aware of. On the hand it's actively helping push chaos engineering into the hands of businesses, but on the other its growth and success, backed by millions of VC cash (and faith), depends upon chaos engineering becoming a mainstream discipline in the DevOps and SRE worlds. It's perhaps this reason that the conference felt so important. It was, according to Gremlin, the first ever public chaos engineering conference. And while it was relatively small in the grand scheme of many of today's festival-esque conferences attended by thousands of delegates (Dreamforce, the Salesforce conference, was also running in San Francisco in the same week), the fact that the conference had quickly sold out all 350 of its tickets - with more hoping on waiting lists - indicates that this was an event that had been eagerly awaited. And with some big names from the industry - notably Adrian Cockcroft from AWS and Jessie Frazelle from Microsoft - Chaos Conf had the air of an event that had outgrown its insider status before it had even began. The renovated cinema and bar in San Francisco's Mission District, complete with pinball machines upstairs, was the perfect container for a passionate community that had grown out of the clean corporate environs of Silicon Valley to embrace the chaotic mess that resembles modern software engineering. Kolton Andrus sets out a vision for the future of Gremlin and chaos engineering Chaos Conf was quick to deliver big news. They keynote speech, by Gremlin co-founder Kolton Andrus launched Gremlin's brand new Application Level Fault Injection (ALFI) feature, which makes it possible to run chaos experiments at an application level. Andrus broke the news by building towards it with a story of the progression of chaos engineering. Starting with Chaos Monkey, the tool first developed by Netflix, and moving from infrastructure to network, he showed how, as chaos engineering has evolved, it requires and faciliates different levels of control and insight on how your software works. "As we're maturing, the host level failures and the network level failures are necessary to building a robust and resilient system, but not sufficient. We need more - we need a finer granularity," Andrus explains. This is where ALFI comes in. By allowing Gremlin users to inject failure at an application level, it allows them to have more control over the 'blast radius' of their chaos experiments. The narrative Andrus was setting was clear, and would ultimately inform the ethos of the day - chaos engineering isn't just about chaos, it's about controlled experimentation to ensure resiliency. To do that requires a level of intelligence - technical and organizational - about how the various components of your software work, and how humans interact with them. Adrian Cockcroft on the importance of historical context and domain knowledge Adrian Cockcroft (@adrianco) VP at AWS followed Andrus' talk. In it he took the opportunity to set the broader context of chaos engineering, highlighting how tackling system failures is often a question of culture - how we approach system failure and think about our software. Developers love to learn things from first principles" he said. "But some historical context and domain knowledge can help illuminate the path and obstacles." If this sounds like Cockcroft was about to stray into theoretical territory, he certainly didn't. He offered plenty of practical frameworks for thinking through potential failure. But the talk wasn't theoretical - Cockcroft offered a taxonomy of failure that provides a useful framework for thinking through potential failure at every level. He also touched on how he sees the future of resiliency evolving, focusing on: Observability of systems Epidemic failure modes Automation and continuous chaos The crucial point Cockcroft makes is that cloud is the big driver for chaos engineering. "As datacenters migrate to the cloud, fragile and manual disaster recovery will be replaced by chaos engineering" read one of his slides. But more than that, the cloud also paves the way for the future of the discipline, one where 'chaos' is simply an automated part of the test and deployment pipeline. Selling chaos engineering to your boss Kriss Rochefolle, DevOps engineer and author of one of the best selling DevOps books in French, delivered a short talk on how engineers can sell chaos to their boss. He takes on the assumption that a rational proposal, informed by ROI is the best way to sell chaos engineering. He suggests instead that engineers need to play into emotions, and presenting chaos engineer as a method for tackling and minimizing the fear of (inevitable failure. Follow Kriss on Twitter: @crochefolle Walmart and chaos engineering Vilas Veraraghavan, the Director of Engineering was keen to clarify that Walmart doesn't practice chaos. Rather it practices resiliency - chaos engineering is simply a method the organization uses to achieve that. It was particularly useful to note the entire process that Vilas' team adopts when it comes to resiliency, which has largely developed out of Vilas' own work building his team from scratch. You can learn more about how Walmart is using chaos engineering for software resiliency in this post. Twitter's Ronnie Chen on diving and planning for failure Ronnie Chen (@rondoftw) is an engineering manager at Twitter. But she didn't talk about Twitter. In fact, she didn't even talk about engineering. Instead she spoke about her experience as a technical diver. By talking about her experiences, Ronnie was able to make a number of vital points about how to manage and tackle failure as a team. With mortality rates so high in diving, it's a good example of the relationship between complexity and risk. Chen made the point that things don't fail because of a single catalyst. Instead, failures - particularly fatal ones - happen because of a 'failure cascade'. Chen never makes the link explicit, but the comparison is clear - the ultimate outcome (ie. success or failure) is impacted by a whole range of situational and behavioral factors that we can't afford to ignore. Chen also made the point that, in diving, inexperienced people should be at the front of an expedition. "If you're inexperienced people are leading, they're learning and growing, and being able to operate with a safety net... when you do this, all kinds of hidden dependencies reveal themselves... every undocumented assumption, every piece of ancient team lore that you didn't even know you were relying on, comes to light." Charity Majors on the importance of observability Charity Majors (@mipsytipsy), CEO of Honeycomb, talked in detail about the key differences between monitoring and observability. As with other talks, context was important: a world where architectural complexity has grown rapidly in the space of a decade. Majors made the point that this increase in complexity has taken us from having known unknowns in our architectures, to many more unknown unknowns in a distributed system. This means that monitoring is dead - it simply isn't sophisticated enough to deal with the complexities and dependencies within a distributed system. Observability, meanwhile, allows you to to understand "what's happening in your systems just by observing it from the outside." Put simply, it lets you understand how your software is functioning from your perspective - almost turning it inside out. Majors then linked the concept to observability to the broader philosophy of chaos engineering - echoing some of the points raised by Adrian Cockcroft in his keynote. But this was her key takeaway: "Software engineers spend too much time looking at code in elaborately falsified environments, and not enough time observing it in the real world." This leads to one conclusion - the importance of testing in production. "Accept no substitute." Tammy Butow and Ana Medina on making an impact Tammy Butow (@tammybutow) and Ana Medina (@Ana_M_Medina) from Gremlin took us through how to put chaos engineering into practice - from integrating it into your organizational culture to some practical tests you can run. One of the best examples of putting chaos into practice is Gremlin's concept of 'Failure Fridays', in which chaos testing becomes a valuable step in the product development process, to dogfood it and test out how a customer experiences it. Another way which Tammy and Ana suggested chaos engineering can be used was as a way of testing out new versions of technologies before you properly upgrade in production. To end, their talk, they demo'd a chaos battle between EKS (Kubernetes on AWS) and AKS (Kubernetes on Azure), doing an app container attack, a packet loss attack and a region failover attack. Jessie Frazelle on how containers can empower experimentation Jessie Frazelle (@jessfraz) didn't actually talk that much about chaos engineering. However, like Ronnie Chen's talk, chaos engineering seeped through what she said about bugs and containers. Bugs, for Frazelle, are a way of exploring how things work, and how different parts of a software infrastructure interact with each other: "Bugs are like my favorite thing... some people really hate when they get one of those bugs that turns out to be a rabbit hole and your kind of debugging it until the end of time... while debugging those bugs I hate them but afterwards, I'm like, that was crazy!" This was essentially an endorsement of the core concept of chaos engineering - injecting bugs into your software to understand how it reacts. Jessie then went on to talk about containers, joking that they're NOT REAL. This is because they're made up of numerous different component parts, like Cgroups, namespaces, and LSMs. She contrasted containers with Virtual machines, zones and jails, which are 'first class concepts' - in other worlds, real things (Jessie wrote about this in detail last year in this blog post). In practice what this means is that whereas containers are like Lego pieces, VMs, zones, and jails are like a pre-assembled lego set that you don't need to play around with in the same way. From this perspective, it's easy to see how containers are relevant to chaos engineering - they empower a level of experimentation that you simply don't have with other virtualization technologies. "The box says to build the death star. But you can build whatever you want." The chaos ends... Chaos Conf was undoubtedly a huge success, and a lot of credit has to go to Gremlin for organizing the conference. It's clear that the team care a lot about the chaos engineering community and want it to expand in a way that transcends the success of the Gremlin platform. While chaos engineering might not feel relevant to a lot of people at the moment, it's only a matter of time that it's impact will be felt. That doesn't mean that everyone will suddenly become a chaos engineer by July 2019, but the cultural ripples will likely be felt across the software engineering landscape. But without Chaos Conf, it would be difficult to see chaos engineering growing as a discipline or set of practices. By sharing ideas and learning how other people work, a more coherent picture of chaos engineering started to emerge, one that can quickly make an impact in ways people wouldn't have expected six months ago. You can watch videos of all the talks from Chaos Conf 2018 on YouTube.

0
0
5516

article-image-log-monitoring-tools-for-continuous-security-monitoring-policy-tutorial

Fatema Patrawala

13 Jul 2018

11 min read

Log monitoring tools for continuous security monitoring policy [Tutorial]

Fatema Patrawala

13 Jul 2018

11 min read

How many times we have heard of organization's entire database being breached and downloaded by the hackers. The irony is, they are not even aware about anything until the hacker is selling the database details on the dark web after few months. Even though they implement decent security controls, what they lack is continuous security monitoring policy. It is one of the most common things that you might find in a startup or mid-sized organization. In this article, we will show how to choose the right log monitoring tool to implement continuous security monitoring policy. You are reading an excerpt from the book Enterprise Cloud Security and Governance, written by Zeal Vora. Log monitoring is a must in security Log monitoring is considered to be part of the de facto list of things that need to be implemented in an organization. It gives us the power of visibility of various events through a single central solution so we don't have to end up doing less or tail on every log file of every server. In the following screenshot, we have performed a new search with the keyword not authorized to perform and the log monitoring solution has shown us such events in a nice graphical way along with the actual logs, which span across days: Thus, if we want to see how many permission denied events occurred last week on Wednesday, this will be a 2-minute job if we have a central log monitoring solution with search functionality. This makes life much easier and would allow us to detect anomalies and attacks in a much faster than traditional approach. Choosing the right log monitoring tool This is a very important decision that needs to be taken by the organization. There are both commercial offerings as well as open source offerings that are available today but the amount of efforts that need to be taken in each of them varies a lot. I have seen many commercial offerings such as Splunk and ArcSight being used in large enterprises, including national level banks. On the contrary, there are also open source offerings, such as ELK Stack, that are gaining popularity especially after Filebeat got introduced. At a personal level, I really like Splunk but it gets very expensive when you have a lot of data being generated. This is one of the reasons why many startups or mid-sized organizations use commercial offering along with open source offerings such as ELK Stack. Having said that, we need to understand that if you decide to go with ELK Stack and have a large amount of data, then ideally you would need a dedicated person to manage it. Just to mention, AWS also has a basic level of log monitoring capability available with the help of CloudWatch. Let's get started with logging and monitoring There will always be many sources from which we need to monitor logs. Since it will be difficult to cover each and every individual source, we will talk about two primary ones, which we will be discussing sequentially: VPC flow logs AWS Config VPC flow logs VPC flow logs is a feature that allows us to capture information related to IP traffic that goes to and from the network interfaces within the VPC. VPC flow logs help in both troubleshooting related to why certain traffic is not reaching the EC2 instances and also understanding what the traffic is that is accepted and rejected. The VPC flow logs can be part of individual network interface level of an EC2 instance. This allows us to monitor how many packets are accepted or rejected in a specific EC2 instance running in the DMZ maybe. By default, the VPC flow logs are not enabled, so we will go ahead and enable the VPC flow log within our VPC: Enabling flow logs for VPC: In our environment, we have two VPCs named Development and Production. In this case, we will enable the VPC flow logs for development VPC: In order to do that, click on the Development VPC and select the Flow Logs tab. This will give you a button named Create Flow Log. Click on it and we can go ahead with the configuration procedure: Since the VPC flow logs data will be sent to CloudWatch, we need to select the IAM Role that gives these permissions: Before we go ahead in creating our first flow log, we need to create the CloudWatch log group as well where the VPC flow logs data will go into. In order to do it, go to CloudWatch, select the Logs tab. Name the log group according to what you need and click on Create log group: Once we have created our log group, we can fill the Destination Log Group field with our log group name and click on the Create Flow Log button: Once created, you will see the new flow log details under the VPC subtab: Create a test setup to check the flow: In order to test if everything is working as intended, we will start our test OpenVPN instance and in the security group section, allow inbound connections on port 443 and icmp (ping). This gives us the perfect base for a plethora of attackers detecting our instance and running a plethora of attacks on our server: Analyze flow logs in CloudWatch: Before analyzing for flow logs, I went for a small walk so that we can get a decent number of logs when we examine; thus, when I returned, I began analyzing the flow logs data. If we observe the flow log data, we see plenty of packets, which have REJECT OK at the end as well as ACCEPT OK. Flow logs can be unto specific interface levels, which are attached to EC2 instances. So, in order to check the flow logs, we need to go to CloudWatch, select the Log Groups tab, inside it select the log group that we created and then select the interface. In our case, we selected the interface related to the OpenVPN instance, which we had started: CloudWatch gives us the capability to filter packets based on certain expressions. We can filter all the rejected packets by creating a simple search for REJECT OK in the search bar and CloudWatch will give us all the traffic that was rejected. This is shown in the following image: Viewing the logs in GUI: Plain text data is good but it's not very appealing and does not give you deep insights about what exactly is happening. It's always preferred to send these logs to a Log Monitoring tool, which can give you deep insights about what exactly is happening. In my case, I have used Splunk to give us an overview about the logs in our environment. When we look into VPC Flow Logs, we see that Splunk gives us great detail in a very nice GUI and also maps the IP addresses to the location from which the traffic is coming: The following image is the capture of VPC flow logs which are being sent to the Splunk dashboard for analyzing the traffic patterns: The VPC Flow Logs traffic rate and location-related data The top rejected destination and IP address, which we rejected AWS Config AWS Config is a great service that allows us to continuously assess and audit the configuration of the AWS-related resources. With AWS Config, we can exactly see what configuration has changed from the previous week to today for services such as EC2, security groups, and many more. One interesting feature that Config allows is to set the compliance test as shown in the following screenshots. We see that there is one rule that is failing and is considered non-compliant, which is the CloudTrail. There are two important features that Config service provides: Evaluate changes in resources over the timeline Compliance checks Once they are enabled and you have associated Config rules accordingly, then you would see a dashboard similar to the following screenshot: In the preceding screenshot, on the left-hand side, Config gives details related to the Resources, which are present in your AWS; and on the right-hand column, Config gives us the status if the resources are compliant or non-compliant according to the rules that are set. Configuring the AWS Config service Let's look into how we can get started with the AWS Config service and have great dashboards along with compliance checks, which we saw in the previous screenshot: Enabling the Config service: The first time when we want to start working with Config, we need to select the resources we want to evaluate. In our case, we will select both the region-specific resources as well as global resources such as IAM: Configure S3 and IAM: Once we decide to include all the resources, the next thing is to create an Amazon S3 bucket where AWS Config will store the configuration and snapshot files. We will also need to select IAM role, which will allow Config into put these files to the S3 bucket: Select Config rules: Configuration rules are checks against your AWS resources, which can be done and the result will be part of the compliance standard. For example, root-account-mfa-enabled rule will check whether the ROOT account has MFA enabled or disabled and in the end it will give you a nice graphical overview about the output of the checks conducted by the rules. Currently, there are 38 AWS-managed rules, which we can select and use anytime; however, we can have custom rules anytime as well. For our case, I will use five specific rules, which are as follows: cloudtrail-enabled iam-password-policy restricted-common-ports restricted-ssh root-account-mfa-enabled Config initialization: With the Config rules selected, we can click on Finish and AWS Config will start, and it will start to check resources and its associated rules. You might get the dashboard similar to the following screenshot, which speaks about the available resources as well as the rule compliance related graphs: Let's analyze the functionality For demo purposes, I decided to disable the CloudTrail service and if we then look into the Config dashboard, it says that one rule check has been failed: Instead of graphs, Config can also show the resources in a tabular manner if we want to inspect the Config rules with the associated names. This is illustrated in the following diagram: Evaluating changes to resources AWS Config allows us to evaluate the configuration changes that have been made to the resources. This is a great feature that allows us to see how our resource looked a day, a week, or even months back. This feature is particularly useful specifically during incidents when, during investigation, one might want to see what exactly changed before the incident took place. It will help things go much faster. In order to evaluate the changes, we will need to perform the following steps: Go to AWS Config | Resources. This will give you the Resource inventory page in which you can either search for resources based on the resource type or based on tags. For our use case, I am searching for a tag value for an EC2 Instance whose name is OpenVPN: When we go inside the Config timeline, we see the overall changes that have been made to the resource. In the following screenshot, we see that there were a few changes that were made, and Config also shows us the time the changes that were made to the resource: When we click on Changes, it will give you the exact detail on what was the exact change that was made. In our case, it is related to the new network interface, which was attached to the EC2 instance. It displays the network interface ID, description along with the IP address, and the security group, which is attached to that network interface: When we start to integrate the AWS services with Splunk or similar monitoring tools, we can get great graphs, which will help us evaluate things faster. On the side, we always have the logs from the CloudTrail, if we want to see the changes that occurred in detail. We covered log monitoring and how to choose the right log monitoring tool for continuous security monitoring policy. Check out the book Enterprise Cloud Security and Governance to build resilient cloud architectures for tackling data disasters with ease. Cloud Security Tips: Locking Your Account Down with AWS Identity Access Manager (IAM) Monitoring, Logging, and Troubleshooting Analyzing CloudTrail Logs using Amazon Elasticsearch

0
0
3332

article-image-interactive-dashboard-with-vrealize-operations-manager

Vijin Boricha

03 Jul 2018

14 min read

Interactive dashboard with vRealize Operations Manager [Tutorial]

Vijin Boricha

03 Jul 2018

14 min read

0
0
10168

article-image-troubleshooting-in-vrealize-operations-components

Vijin Boricha

06 Jun 2018

11 min read

Troubleshooting techniques in vRealize Operations components

Vijin Boricha

06 Jun 2018

11 min read

0
0
18356

article-image-endpoint-operations-management-agent-vrealize-operations

Vijin Boricha

30 May 2018

10 min read

How to ace managing the Endpoint Operations Management Agent with vROps

Vijin Boricha

30 May 2018

10 min read

0
0
4273

article-image-vmware-vsphere-storage-datastores-snapshots

Packt

21 Feb 2018

9 min read

VMware vSphere storage, datastores, snapshots

Packt

21 Feb 2018

9 min read

VMware vSphere storage, datastores, snapshotsIn this article, byAbhilash G B, author of the book,VMware vSphere 6.5 CookBook - Third Edition, we will cover the following:Managing VMFS volumes detected as snapshotsCreating NFSv4.1 datastores with Kerberos authenticationEnabling storage I/O control (For more resources related to this topic, see here.) IntroductionStorage is an integral part of any infrastructure. It is used to store the files backing your virtual machines.Â The most common way to refer to a type of storage presented to a VMware environment is based on the protocol used and the connection type.Â NFS are storage solutions that can leverage the existing TCP/IP network infrastructure. Hence, they are referred to as IP-based storage.Â Storage IO Control (SIOC) is one of the mechanisms to use ensure a fair share of storage bandwidth allocation to all Virtual Machines running on shared storage, regardless of the ESXi host the Virtual Machines are running on. Managing VMFS volumes detected as snapshotsSome environments maintain copies of the production LUNs as a backup, by replicating them. These replicas are exact copies of the LUNs that were already presented to the ESXi hosts. If for any reason a replicated LUN is presented to an ESXi host, then the host will not mount the VMFS volume on the LUN. This is a precaution to prevent data corruption.Â ESXi identifies each VMFS volume using its signature denoted by aUniversally Unique IdentifierÂ (UUID). The UUID is generated when the volume is first created or resignatured and is stored in the LVM header of the VMFS volume.Â When an ESXi host scans for new LUN ;devices and VMFS volumes on it, it compares the physical device ID (NAA ID) of the LUN with the device ID (NAA ID) value stored in the VMFS volumes LVM header. If it finds a mismatch, then it flags the volume as a snapshot volume.Volumes detected as snapshots are not mounted by default.Â There are two ways to mount such volumes/datastore:Mount by Keeping the Existing Signature Intact -Â This is used when you are attempting to temporarily mount the snapshot volume on an ESXi that doesn't see the original volume. If you were to attempt mounting the VMFS volume by keeping the existing signature and if the host sees the original volume, then you will not be allowed to mount the volume and will be warned about the presence of another VMFS volume with the same UUID:Mount by generating a new VMFS Signature -Â This has to be used if you are mounting a clone or a snapshot of an existing VMFS datastore to the same host/s.Â The process of assigning a new signature will not only update the LVM header with the newly generated UUID, but all the Physical Device ID (NAA ID) of the snapshot LUN. Here, the VMFS volume/datastore will be renamed by prefixing the wordsnap followed by a random number and the name of the original datastore: Getting readyMake sure that the original datastore and its LUN is no longer seen by the ESXi host the snapshot is being mounted to. How to do it...The following procedure will help mount a VMFS volume from a LUN detected as a snapshot:Log in to the vCenter Server using the vSphere Web Client and use the key combination Ctrl+Alt+2Â to switch to the Host and ClustersÂ view.Right click on the ESXi host the snapshot LUN is mapped to and go to Storage | New Datastore.On the New Datastore wizard, select VMFS as the filesystem type and click Next to continue.On the Name and Device selection screen, select the LUN detected as a snaphsot and click Next to continue:On the Mount Option screen, choose to either mount by assigning a new signature or by keeping the existing signature, and click Next to continue:On the Ready to Complete screen, review the setting and click Finish to initiate the operation. Creating NFSv4.1 datastores with Kerberos authenticationVMware introduced support for NFS 4.1 with vSphere 6.0. The vSphere 6.5 added several enhancements:It now supports AES encryptionSupport for IP version 6Support Kerberos's integrity checking mechanismHere, we will learn how to create NFS 4.1 datastores. Although the procedure is similar to NFSv3, there are a few additional steps that needs to be performed. Getting readyFor Kerberos authentication to work, you need to make sure that the ESXi hosts and the NFS Server is joined to the Active Directory domainCreate a new or select an existing AD user for NFS Kerberos authenticationConfigure the NFS Server/Share to Allow access to the AD user chosen for NFS Kerberos authentication How to do it...The following procedure will help you mount an NFS datasture using the NFSv4.1 client with Kerberos authentication enabled:Log in to the vCenter Server using the vSphere Web Client and use the key combination Ctrl+Alt+2 to switch to the Host and Clusters view, select the desired ESXi host andÂ navigate to it Â Configure | System | AuthenticationÂ Services section and supply the credentials of the Active Directory user that was chosen for NFS Kerberon Authentication:Right-click on the desired ESXi host and go to Storage | New DatastoreÂ to bring-up the Add Storage wizard.On the New Datastore wizard, select the Type as NFS and click Next to continue.On the Select NFS version screen, select NFS 4.1Â and click Next to continue. Keep in mind that it is not recommended to mount an NFS Export using both NFS3 and NFS4.1 client.Â On the Name and Configuration screen, supply a Name for the Datastore, the NFS export's folder path and NFS Server's IP Address or FQDN. You can also choose to mount the share as ready-only if desired:On the Configure Kerberos Authentication screen, check theÂ Enable Kerberos-based authentication box and choose the type of authentication required and click Next to continue:On the Ready to Complete screen review the settings and click Finish to mount the NFS export. Enabling storage I/O controlThe use of disk shares will work just fine as long as the datastore is seen by a single ESXi host. Unfortunately, that is not a common case. Datastores are often shared among multiple ESXi hosts. When datastores are shared, you bring in more than one local host scheduler into the process of balancing the I/O among the virtual machines. However, these lost host schedules cannot talk to each other and their visibility is limited to the ESXi hosts they are running on. This easily contributes to a serious problem called thenoisy neighbor situation.Â The job of SIOC is to enable some form of communication between local host schedulers so that I/O can be balanced between virtual machines running on separate hosts.Â How to do it...The following procedure will help you enable SIOC on a datastore:Connect to the vCenter Server using the Web Client and switch to the StorageÂ view using the key combination Ctrl+Alt+4.Right-click on the desired datastore and go to Configure Storage I/O Control:On the Configure Storage I/O Control window, select the checkbox Enable Storage I/O Control, set a custom congestion threshold (only if needed) and click OK to confirm the settings:Â With the Virtual Machine selected from the inventory, navigate to its Configure | General tab and review its datastore capability settings to ensure that SIOC is enabled:Â How it works...As mentioned earlier, SIOC enables communication between these local host schedulers so that I/O can be balanced between virtual machines running on separate hosts. It does so by maintaining a shared file in the datastore that all hosts can read/write/update. When SIOC is enabled on a datastore, it starts monitoring the device latency on the LUN backing the datastore. If the latency crosses the threshold, it throttles the LUN's queue depth on each of the ESXi hosts in an attempt to distribute a fair share of access to the LUN for all the Virtual Machines issuing the I/O.The local scheduler on each of the ESXi hosts maintains an iostatsÂ file to keep its companion hosts aware of the device I/O statistics observed on the LUN. The file is placed in a directory (naa.xxxxxxxxx) on the same datastore.For example, if there are six virtual machines running on three different ESXi hosts, accessing a shared LUN. Among the six VMs, four of them have a normal share value of 1000 and the remaining two have high (2000) disk share value sets on them. These virtual machines have only a single VMDK attached to them. VM-C on host ESX-02 is issuing a large number of I/O operations. Since that is the only VM accessing the shared LUN from that host, it gets the entire queue's bandwidth. This can induce latency on the I/O operations performed by the other VMs: ESX-01 and ESX-03. If the SIOC detects the latency value to be greater than the dynamic threshold, then it will start throttling the queue depth:Â The throttled DQLEN for a VM is calculated as follows:DQLEN for the VM = (VM's Percent of Shares) of (Queue Depth)Example: 12.5 % of 64 â (12.5 * 64)/100 = 8The throttled DQLEN per host is calculated as follows:DQLEN of the Host = Sum of the DQLEN of the VMs on itExample: VM-A (8) + VM-B(16) = 24The following diagram shows the effect of SIOC throttling the queue depth: SummaryIn this article we learnt, how toÂ mount a VMFS volume from a LUN detected as a snapshot, how to mount an NFS datasture using the NFSv4.1 client with Kerberos authentication enabled, and how toÂ enable SIOC on a datastore. Resources for Article: Further resources on this subject: Essentials of VMware vSphere [article] Working with VMware Infrastructure [article] Network Virtualization and vSphere [article]

0
0
3182

Packt

05 Jul 2017

10 min read

KVM Networking with libvirt

Packt

05 Jul 2017

10 min read

0
0
18911

article-image-introduction-sdn-transformation-legacy-sdn

Packt

07 Jun 2017

23 min read

Introduction to SDN - Transformation from legacy to SDN

Packt

07 Jun 2017

23 min read

In this article, by Reza Toghraee, the author of the book, Learning OpenDayLight, we will: What is and what is not SDN Components of a SDN Difference between SDN and overlay The SDN controllers (For more resources related to this topic, see here.) You might have heard about Software-Defined Networking (SDN). If you are in networking industry this is a topic which probably you have studied initially when first time you heard about the SDN.To understand the importance of SDN and SDN controller, let's look at Google. Google silently built its own networking switches and controller called Jupiter. A home grown project which is mostly software driven and supports such massive scale of Google. The SDN base is There is a controller who knows the whole network. OpenDaylight (ODL), is a SDN controller. In other words, it's the central brain of the network. Why we are going towards SDN Everyone who is hearing about SDN, should ask this question that why are we talking about SDN. What problem is it trying to solve? If we look at traditional networking (layer 2, layer 3 with routing protocols such as BGP, OSPF) we are completely dominated by what is so called protocols. These protocols in fact have been very helpful to the industry. They are mostly standard. Different vendor and products can communicate using standard protocols with each other. A Cisco router can establish a BGP session with a Huawei switch or an open source Quagga router can exchange OSPF routes with a Juniper firewall. Routing protocol is a constant standard with solid bases. If you need to override something in your network routing, you have to find a trick to use protocols, even by using a static route. SDN can help us to come out of routing protocol cage, look at different ways to forward traffic. SDN can directly program each switch or even override a route which is installed by routing protocol. There are high-level benefits of using SDN which we explain few of them as follows: An integrated network: We used to have a standalone concept in traditional network. Each switch was managed separately, each switch was running its own routing protocol instance and was processing routing information messages from other neighbors. In SDN, we are migrating to a centralized model, where the SDN controller becomes the single point of configuration of the network, where you will apply the policies and configuration. Scalable layer 2 across layer 3:Having a layer 2 network across multiple layer 3 network is something which all network architects are interested and till date we have been using proprietary methods such as OTV or by using a service provider VPLS service. With SDN, we can create layer 2 networks across multiple switches or layer 3 domains (using VXLAN) and expand the layer 2 networks. In many cloud environment, where the virtual machines are distributed across different hosts in different datacenters, this is a major requirement. Third-party application programmability: This is a very generic term, isn't it? But what I'm referring to is to let other applications communicate with your network. For example,In many new distributed IP storage systems, the IP storage controller has ability to talk to network to provide the best, shortest path to the storage node. With SDN we are letting other applications to control the network. Of course this control has limitation and SDN doesn't allow an application to scrap the whole network. Flexible application based network:In SDN, everything is an application. L2/L3, BGP, VMware Integration, and so on all are applications running in SDN controller. Service chaining:On the fly you add a firewall in the path or a load balancer. This is service insertion. Unified wired and wireless: This is an ideal benefit, to have a controller which supports both wired and wireless network. OpenDaylight is the only controller which supports CAPWAP protocols which allows integration with wireless access points. Components of a SDN A software defined network infrastructure has two main key components: The SDN Controller (only one, could be deployed in a highly available cluster) The SDN enabled switches(multiple switches, mostly in a Clos topology in a datacenter): SDN controller is the single brain of the SDN domain. In fact, an SDN domain is very similar to a chassis based switch. You can imagine supervisor or management module of a chassis based switch as a SDN controller and rest of the line card and I/O cards as SDN switches. The main difference between a SDN network and a chassis based switch is that you can scale out the SDN with multiple switches, where in a chassis based switch you are limited to the number of slots in that chassis: Controlling the fabric It is very important that you understand the main technologies involved in SDN. These methods are used by SDN controllers to manage and control the SDN network. In general, there are two methods available for controlling the fabric: Direct fabric programming: In this method, SDN controller directly communicates with SDN enabled switches via southbound protocols such as OpenFlow, NETCONF and OVSDB. SDN controller programs each switch member with related information about fabric, and how to forward the traffic. Direct fabric programming is the method used by OpenDaylight. Overlay:In overlay method, SDN controller doesn't rely on programing the network switches and routers. Instead it builds an virtual overlay network on top of existing underlay network. The underlay network can be a L2 or L3 network with traditional network switches and router, just providing IP connectivity. SDN controller uses this platform to build the overlay using encapsulation protocols such as VXLAN and NVGRE. VMware NSX uses overlay technology to build and control the virtual fabric. SDN controllers One of the key fundamentals of SDN is disaggregation. Disaggregation of software and hardware in a network and also disaggregation of control and forwarding planes. SDN controller is the main brain and controller of an SDN environment, it's a strategic control point within the network and responsible for communicating information to: Routers and switches and other network devices behind them. SDN controllers uses APIs or protocols (such as OpenFlow or NETCONF) to communicate with these devices. This communication is known as southbound Upstream switches, routers or applications and the aforementioned business logic (via APIs or protocols). This communication is known as northbound. An example for a northbound communication is a BGP session between a legacy router and SDN controller. If you are familiar with chassis based switches like Cisco Catalyst 6500 or Nexus 7k chassis, you can imagine a SDN network as a chassis, with switches and routers as its I/O line cards and SDN controller as its supervisor or management module. Infact SDN is similar to a very scalable chassis where you don't have any limitation on number of physical slots. SDN controller is similar to role of management module of a chassis based switch and it controls all switches via its southbound protocols and APIs. The following table compares the SDN controller and a chassis based switch: SDN Controller Chassis based switch Supports any switch hardware Supports only specific switch line cards Can scale out, unlimited number of switches Limited to number of physical slots in the chassis Supports high redundancy by multiple controllers in a cluster Supports dual management redundancy, active standby Communicates with switches via southbound protocols such as OpenFlow, NETCONF, BGP PCEP Use proprietary protocols between management module and line cards Communicates with routers, switches and applications outside of SDN via northbound protocols such as BGP, OSPF and direct API Communicates with other routers and switches outside of chassis via standard protocols such as BGP, OSPF or APIs. The first protocol that popularized the concept behind SDN was OpenFlow. When conceptualized by networking researchers at Stanford back in 2008, it was meant to manipulate the data plane to optimize traffic flows and make adjustments, so the network could quickly adapt to changing requirements. Version 1.0 of the OpenFlow specification was released in December of 2009; it continues to be enhanced under the management of the Open Networking Foundation, which is a user-led organization focused on advancing the development of open standards and the adoption of SDN technologies. OpenFlow protocol was the first protocol that helped in popularizing SDN. OpenFlow is a protocol designed to update the flow tables in a switch. Allowing the SDN controller to access the forwarding table of each member switch or in other words to connect control plane and data plane in SDN world. Back in 2008, OpenFlow conceptualized by networking researchers at Stanford University, the initial use of OpenFlow was to alter the switch forwarding tables to optimize traffic flows and make adjustments, so the network could quickly adapt to changing requirements. After introduction of OpenFlow, NOX introduced as original OpenFlow controller (still there wasn't concept of SDN controller). NOX was providing a high-level API capable of managing and also developing network control applications. Separate applications were required to run on top of NOX to manage the network.NOX was initially developed by Nicira networks (which acquired by VMware, and finally became part of VMware NSX). NOX introduced along with OpenFlow in 2009. NOX was a closed source product but ultimately it was donated to SDN community which led to multiple forks and sub projects out of original NOX. For example, POX is a sub project of NOX which provides Python support. Both NOX and POX were early controllers. NOX appears an inactive development, however POX is still in use by the research community as it is a Python based project and can be easily deployed. POX is hosted at http://github.com/noxrepo/pox NOX apart from being the first OpenFlow or SDN controller also established a programing model which inherited by other subsequent controllers. The model was based on processing of OpenFlow messages, with each incoming OpenFlow message trigger an event that had to be processed individually. This model was simple to implement but not efficient and robust and couldn't scale. Nicira along with NTT and Google started developing ONIX, which was meant to be a more abstract and scalable for large deployments. ONIX became the base for Nicira (the core of VMware NSX or network virtualization platform) also there are rumors that it is also the base for Google WAN controller. ONIX was planned to become open source and donated to community but for some reasons the main contributors decided to not to do it which forced the SDN community to focus on developing other platforms. Started in 2010, a new controller introduced,the Beacon controller and it became one of the most popular controllers. It born with contribution of developers from Stanford University. Beacon is a Java-based open source OpenFlow controller created in 2010. It has been widely used for teaching, research, and as the basis of Floodlight. Beacon had the first built-in web user-interface which was a huge step forward in the market of SDN controllers. Also it provided a easier method to deploy and run compared to NOX. Beacon was an influence for design of later controllers after it, however it was only supporting star topologies which was one of the limitations on this controller. Floodlight was a successful SDN controller which was built as a fork of Beacon. BigSwitch networks is developing Floodlight along with other developers. In 2013, Beacon popularity started to shrink down and Floodlight started to gain popularity. Floodlight had fixed many issues of Beacon and added lots of additional features which made it one of the most feature rich controllers available. It also had a web interface, a Java-based GUI and also could get integrated with OpenStack using quantum plugin. Integration with OpenStack was a big step forward as it could be used to provide networking to a large pool of virtual machines, compute and storage. Floodlight adoption increased by evolution of OpenStack and OpenStack adopters. This gave Floodlight greater popularity and applicability than other controllers that came before. Most of controllers came after Floodlight also supported OpenStack integration. Floodlight is still supported and developed by community and BigSwitch networks, and is a base for BigCloud Fabric (the BigSwitch's commercial SDN controller). There are other open source SDN controllers which introduced such as Trema (ruby-based from NEC), Ryu (supported by NTT), FlowER, LOOM and the recent OpenMUL. The following table shows the current open source SDN controllers: Active open source SDNcontroller Non-active open source SDN controllers Floodlight Beacon OpenContrail FlowER OpenDaylight NOX LOOM NodeFlow OpenMUL ONOS POX Ryu Trema OpenDaylight OpenDaylight started in early 2013, and was originally led by IBM and Cisco. It was a new collaborative open source project. OpenDaylight hosted under Linux Foundation and draw support and interest from many developers and adopters. OpenDaylight is a platform to provide common foundations and a robust array of services for SDN environments. OpenDaylight uses a controller model which supports OpenFlow as well as other southbound protocols. It is the first open source controller capable of employing non-OpenFlow proprietary control protocols which eventually lets OpenDaylight to integrate with modern and multi-vendor networks. The first release of OpenDaylight in February 2014 with code name of Hydrogen, followed by Helium in September 2014. The Helium release was significant because it marked a change in direction for the platform that has influenced the way subsequent controllers have been architected. The main change was in the service abstraction layer, which is the part of the controller platform that resides just above the southbound protocols, such as OpenFlow, isolating them from the northbound side and where the applications reside. Hydrogen used an API-driven Service Abstraction Layer (AD-SAL), which had limitations specifically, it meant the controller needed to know about every type of device in the network AND have an inventory of drivers to support them. Helium introduced a Model-driven service abstraction layer (MD-SAL), which meant the controller didn't have to account for all the types of equipment installed in the network, allowing it to manage a wide range of hardware and southbound protocols. Helium release made the framework much more agile and adaptable to changes in the applications; an application could now request changes to the model, which would be received by the abstraction layer and forwarded to the network devices. The OpenDaylight platform built on this advancement in its third release, Lithium, which was introduced in June of 2015. This release focused on broadening the programmability of the network, enabling organizations to create their own service architectures to deliver dynamic network services in a cloud environment and craft intent-based policies. Lithium release was worked on by more than 400 individuals, and contributions from Big Switch Networks, Cisco, Ericsson, HP, NEC, and so on, making it one of the fastest growing open source projects ever. The fourth release, Beryllium come out in February of 2016 and the most recent fifth release, Boron released in September 2016. Many vendors have built and developed commercial SDN controller solutions based on OpenDaylight. Each product has enhanced or added features to OpenDaylight to have some differentiating factor. The use of OpenDaylight in different vendor products are: A base, but sell a commercial version with additional proprietary functionality—for example: Brocade, Ericsson, Ciena, and so on. Part of their infrastructure in their Network as a Service (or XaaS) offerings—for example: Telstra, IBM, and so on. Elements for use in their solution—for example: ConteXtream (now part of HP) Open Networking Operating System (ONOS), which was open sourced in December 2014 is focused on serving the needs of service providers. It is not as widely adopted as OpenDaylight, ONOS has been finding success and gaining momentum around WAN use cases. ONOS is backed by numerous organizations including AT&T, Cisco, Fujitsu, Ericsson, Ciena, Huawei, NTT, SK Telecom, NEC, and Intel, many of whom are also participants in and supporters of OpenDaylight. Apart from open source SDN controllers, there are many commercial, proprietary controllers available in the market. Products such as VMware NSX, Cisco APIC, BigSwitch Big Cloud Fabric, HP VAN and NEC ProgrammableFlow are example commercial and proprietary products. The following table lists the commercially available controllers and their relationship to OpenDaylight: ODL-based ODL-friendly Non-ODL based Avaya Cyan (acquired by Ciena) BigSwitch Brocade HP Juniper Ciena NEC Cisco ConteXtream (HP) Nuage Plexxi Coriant PLUMgrid Ericsson Pluribus Extreme Sonus Huawei (also ships non-ODL controller) VMware NSX Core features of SDN Regardless of an open source or a proprietary SDN platform, there are core features and capabilities which requires the SDN platform to support. These capabilities include: Fabric programmability:Providing the ability to redirect traffic, apply filters to packets (dynamically), and leverage templates to streamline the creation of custom applications. Ensuring northbound APIs allow the control information centralized in the controller available to be changed by SDN applications. This will ensure the controller can dynamically adjust the underlying network to optimize traffic flows to use the least expensive path, take into consideration varying bandwidth constraints, meet quality of service (QoS) requirements. Southbound protocol support:Enabling the controller to communicate to switches and routers and manipulate and optimize how they manage the flow of traffic. Currently OpenFlow is the most standard protocol used between different networking vendors, while there are other southbound protocols that can be used. A SDN platform should support different versions of OpenFlow in order to provide compatibility with different switching equipments. External API support:Ensuring the controller can be used within the varied orchestration and cloud environments such as VMware vSphere, OpenStack, and so on. Using APIs the orchestration platform can communicate with SDN platform in order to publish network policies. For example VMware vSphere shall talk to SDN platform to extend the virtual distributed switches(VDS) from virtual environment to the physical underlay network without any requirement form an network engineer to configure the network. Centralized monitoring and visualization:Since SDN controller has a full visibility over the network, it can offer end-to-end visibility of the network and centralized management to improve overall performance, simplify the identification of issues and accelerate troubleshooting. The SDN controller will be able to discover and present a logical abstraction of all the physical links in the network, also it can discover and present a map of connected devices (MAC addresses) which are related to virtual or physical devices connected to the network. The SDN controller support monitoring protocols, such as syslog, snmp and APIs in order to integrate with third-party management and monitoring systems. Performance: Performance in a SDN environment mainly depends on how fast SDN controller fills the flow tables of SDN enabled switches. Most of SDN controllers pre-populate the flow tables on switches to minimize the delay. When a SDN enabled switch receives a packet which doesn't find a matching entry in its flow table, it sends the packet to the SDN controller in order to find where the packet needs to get forwarded to. A robust SDN solution should ensure that the number of requests form switches are minimum and SDN controller doesn't become a bottleneck in the network. High availability and scalability: Controllers must support high availability clusters to ensure reliability and service continuity in case of failure of a controller. Clustering in SDN controller expands to scalability. A modern SDN platform should support scalability in order to add more controller nodes with load balancing in order to increase the performance and availability. Modern SDN controllers support clustering across multiple different geographical locations. Security:Since all switches communicate with SDN controller, the communication channel needs to be secured to ensure unauthorized devices doesn't compromise the network. SDN controller should secure the southbound channels, use encrypted messaging and mutual authentication to provide access control. Apart from that the SDN controller must implement preventive mechanisms to prevent from denial of services attacks. Also deployment of authorization levels and level controls for multi-tenant SDN platforms is a key requirement. Apart from the aforementioned features SDN controllers are likely to expand their function in future. They may become a network operating system and change the way we used to build networks with hardware, switches, SFPs and gigs of bandwidth. The future will look more software defined, as the silicon and hardware industry has already delivered their promises for high performance networking chips of 40G, 100G. Industry needs more time to digest the new hardware and silicons and refresh the equipment with new gears supporting 10 times the current performance. Current SDN controllers In this section, I'm putting the different SDN controllers in a table. This will help you to understand the current market players in SDN and how OpenDaylight relates to them: Vendors/product Based on OpenDaylight? Commercial/open source Description Brocade SDN controller Yes Commercial It's a commercial version of OpenDaylight, fully supported and with extra reliable modules. Cisco APIC No Commercial Cisco Application Policy Infrastructure Controller (APIC) is the unifying automation and management point for the Application Centric Infrastructure (ACI) data center fabric. Cisco uses APIC controller and Nexus 9k switches to build the fabric. Cisco uses OpFlex as main southbound protocol. Erricson SDN controller Yes Commercial Ericsson's SDN controller is a commercial (hardened) version OpenDaylight SDN controller. Domain specific control applications that use the SDN controller as platform form the basis of the three commercial products in our SDN controller portfolio. Juniper OpenContrial /Contrail No Both OpenContrail is opensource, and Contrail itself is a commercial product. Juniper Contrail Networking is an open SDN solution that consists of Contrail controller, Contrail vRouter, an analytics engine, and published northbound APIs for cloud and NFV. OpenContrail is also available for free from Juniper. Contrail promotes and use MPLS in datacenter. NEC Programmable Flow No Commercial NEC provides its own SDN controller and switches. NEC SDN platform is one of choices of enterprises and has lots of traction and some case studies. Avaya SDN Fx controller Yes Commercial Based on OpenDaylight, bundled as a solution package. Big Cloud Fabric No Commercial BigSwitch networks solution is based on Floodlight opensource project. BigCloud Fabric is a robust, clean SDN controller and works with bare metal whitebox switches. BigCloud Fabric includes SwitchLightOS which is a switch operating system can be loaded on whitebox switches with Broadcom Trident 2 or Tomahawk silicons. The benefit of BigCloud Fabric is that you are not bound to any hardware and you can use baremetal switches from any vendor. Ciena's Agility Yes Commercial Ciena's Agility multilayer WAN controller is built atop the open-source baseline of the OpenDaylight Project—an open, modular framework created by a vendor-neutral ecosystem (rather than a vendor-centric ego-system) that will enable network operators to source network services and applications from both Ciena's Agility and others. HP VAN (Virtual Application Network) No Commercial The building block of the HP open SDN ecosystem, the controller allows third-party developers to deliver innovative SDN solutions. Huawei Agile controller Yes and No (based on editions) Commercial Huawei's SDN controller which integrates as a solution with Huawei enterprise switches Nuage No Commercial Nuage Networks VSP provides SDN capabilities for clouds of all sizes. It is implemented as a non-disruptive overlay for all existing virtualized and non-virtualized server and network resources. Pluribus Netvisor No Commercial Netvisor Premium and Open Netvisor Linux are distributed network operating systems. Open Netvisor integrates atraditional, interoperable networking stack (L2/L3/VXLAN) with an SDN distributed controller that runs in everyswitch of the fabric. VMware NSX No Commercial VMware NSX is an Overlay type of SDN, which currently works with VMware vSphere. The plan is to support OpenStack in future. VMware NSX also has built-in firewall, router and L4 load balancers allowing micro segmentation. OpenDaylight as an SDN controller Previously, we went through the role of SDN controller, and a brief history of ODL.ODL is a modular open SDN platform which allows developers to build any network or business application on top of it to drive the network in the way they want. Currently OpenDaylight has reached to its fifth release (Boron, which is the fifth element in periodic table). ODL releases are named based on periodic table elements, started from first release the Hydrogen. ODL has a 6 month release period, with many developers working on expanding the ODL, 2 releases per year is expected from community. For technical readers to understand it more clearly, the following diagram will help: ODL platform has a broad set of use cases for multivendor, brown field, green fields, service providers and enterprises. ODL is a foundation for networks of the future. Service providers are using ODL to migrate their services to a software enabled level with automatic service delivery and coming out of circuit-based mindset of service delivery. Also they work on providing a virtualized CPE with NFV support in order to provide flexible offerings. Enterprises use ODL for many use cases, from datacenter networking, Cloud and NFS, network automation and resource optimization, visibility, control to deploying a fully SDN campus network. ODL uses a MD-SAL which makes it very scalable and lets it incorporate new applications and protocols faster. ODL supports multiple standard and proprietary southbound protocols, for example with full support of OpenFlow and OVSDB, ODL can communicate with any standard hardware (or even the virtual switches such as Open vSwitch(OVS) supporting such protocols). With such support, ODL can be deployed and used in multivendor environments and control hardware from different vendors from a single console no matter what vendor and what device it is, as long as they support standard southbound protocols. ODL uses a micro service architecture model which allows users to control applications, protocols and plugins while deploying SDN applications. Also ODL is able to manage the connection between external consumers and providers. The followingdiagram explains the ODL footprint and different components and projects within the ODL: Micro servicesarchitecture ODL stores its YANG data structure in a common data store and uses messaging infrastructure between different components to enable a model-driven approach to describe the network and functions. In ODL MD-SAL, any SDN application can be integrated as a service and then loaded into the SDN controller. These services (apps) can be chained together in any number and ways to match the application needs. This concept allows users to only install and enable the protocols and services they need which makes the system light and efficient. Also services and applications created by users can be shared among others in the ecosystem since the SDN application deployment for ODL follows a modular design. ODL supports multiple southbound protocols. OpenFlow and OpenFlow extension such as Table Type Patterns (TTP), as well as other protocols including NETCONF, BGP/PCEP, CAPWAP and OVSDB. Also ODL supports Cisco OpFlex protocol: ODL platform provides a framework for authentication, authorization and accounting (AAA), as well as automatic discovery and securing of network devices and controllers. Another key area in security is to use encrypted and authenticated communication trough southbound protocols with switches and routers within the SDN domain. Most of southbound protocols support security encryption mechanisms. Summary In this article we learned about SDN, and why it is important. We reviewed the SDN controller products, the ODL history as well as core features of SDN controllers and market leader controllers. We managed to dive in some details about SDN . Resources for Article: Further resources on this subject: Managing Network Devices [article] Setting Up a Network Backup Server with Bacula [article] Point-to-Point Networks [article]

0
0
6057

article-image-learning-basic-powercli-concepts

Packt

05 Jan 2017

7 min read

Learning Basic PowerCLI Concepts

Packt

05 Jan 2017

7 min read

0
0
3046

article-image-storage-practices-and-migration-hyper-v-2016

Packt

01 Dec 2016

17 min read

Storage Practices and Migration to Hyper-V 2016

Packt

01 Dec 2016

17 min read

In this article by Romain Serre, the author of Hyper-V 2016 Best Practices, we will learn about Why Hyper-V projects fail, Overview of the failover cluster, Storage Replica, Microsoft System Center, Migrating VMware virtual machines, and Upgrading single Hyper-v hosts. (For more resources related to this topic, see here.) Why Hyper-V projects fail Before you start deploying your first production Hyper-V host, make sure that you have completed a detailed planning phase. I have been called in to many Hyper-V projects to assist in repairing what a specialist has implemented. Most of the time, I start by correcting the design because the biggest failures happen there, but are only discovered later during implementation. I remember many projects in which I was called in to assist with installations and configurations during the implementation phases, because these were the project phases where a real expert was needed. However, based on experience, this notion is wrong. Most critical to a successful design phase are two features—its rare existence and someone with technological and organizational experience with Hyper-V. If you don't have the latter, look out for a Microsoft Partner with a Gold Competency called Management and Virtualization on Microsoft Pinpoint (http://pinpoint.microsoft.com) and take a quick look at the reviews done by customers for successful Hyper-V projects. If you think it's expensive to hire a professional, wait until you hire an amateur. Having an expert in the design phase is the best way to accelerate your Hyper-V project. Overview of the failover cluster Before you start your first deployment in production, make sure you have defined the aim of the project and its smart criteria and have done a thorough analysis of the current state. After this, you should be able to plan the necessary steps to reach the target state, including a pilot phase. instantly detected. The virtual machines running on that particular node are powered off immediately because of the hardware failure on their computing node. The remaining cluster nodes then immediately take over these VMs in an unplanned failover process and start them on their respective own hardware. The virtual machines will be the backup running after a successful boot of their operating systems and applications in just a few minutes. Hyper-V Failover Clusters work under the condition that all compute nodes have access to a shared storage instance, holding the virtual machine configuration data and its virtual hard disks. In case of a planned failover, that is, for patching compute nodes, it's possible to move running virtual machines from one cluster node to another without interrupting the VM. All cluster nodes can run virtual machines at the same time, as long as there is enough failover capacity running all services when a node goes down. Even though a Hyper-V cluster is still called a Failover Cluster, utilizing the Windows Server Failover Clustering feature, it is indeed capable of running an Active/Active Cluster. To ensure that all these capabilities of a Failover Cluster are indeed working, it demands an accurate planning and implementation process. Storage Replica Storage Replica is a new feature in Windows Server 2016 that provides block replication from storage level for a data recovery plan or for a stretched cluster. Storage Replica can be used in the following scenarios: Server-to-server storage replication using Storage Replica Storage replication in a stretch cluster using Storage Replica Cluster-to-cluster storage replication using Storage Replica Server-to-itself to replicate between volumes using Storage Replica Regarding the scenario or the bandwidth and the latency of the inter-site link, you can choose between a synchronous and an asynchronous replication. For further information about Storage Replica, you can about read this topic at http://bit.ly/2albebS. The SMB3 protocol is used to make Storage Replica. You can leverage TCP/IP or RDMA on the network. I recommend you to implement RDMA when it is possible to reduce latency and CPU workload and to increase throughput. Compared to Hyper-V Replica, the Storage Replica feature provides a replication of all Virtual Machines stored in a volume from the block level. Moreover, Storage Replica can replicate in the synchronous mode while Hyper-V Replica is always in the asynchronous mode. To finish, with Hyper-V Replica you have to specify the failover IP Address because the replication is executed from the VM level, whereas with Storage Replica, you don't need to specify the failover IP Address, but in case of a replication between two clusters in two different rooms, the VM networks must be configured in the destination room. The use of Hyper-V Replica or Storage Replica depends on the disaster recovery plan you need. If you want to protect some application workloads, you can use Hyper-V Replica. On the other hand, if you have the passive room ready to restart in case of issues in the active room, Storage Replica can be a great solution because all the VMs in a volume will be already replicated. To deploy a replication between two clusters you need two sets of storage based on iSCSI, SAS JBOD, fibre channel SAN, or Shared VHDX. For better performance I recommend you to implement SSD that will be used for the logs of Storage Replica. Microsoft System Center Microsoft System Center 2016 is Microsoft's solution for advanced management of Windows Server and its components, along with its dependencies such as various hardware and software products. It consists of various components that support every stage of your IT services from planning to operating over backup to automation. System Center has existed since 1994 and has evolved continuously. It now offers a great set of tools for very efficient management of server and client infrastructures. It also offers the ability to create and operate whole clouds—run in your own data center or a public cloud data center such as Microsoft Azure. Today, it's your choice whether to run your workloads on-premises or off-premises. System Center provides a standardized set of tools for a unique and consistent Cloud OS management experience. System Center does not add any new features to Hyper-V, but it does offer great ways to make the most out of it and ensure streamlined operating processes after its implementation. System Center is licensed via the same model as Windows Server is, leveraging Standard and data center Editions on a physical host level. While every System Center component offers great value in itself, the binding of multiple components into a single workflow offers even more advantages, as shown in the following screenshot: System Center overview When do you need System Center? There is no right or wrong answer to this, and the most given answer by any IT consultant around the world is, "It depends". System Center adds value to any IT environment starting with only a few systems. In my experience, a Hyper-V environment with up to three hosts, and 15 VMs can be managed efficiently without the use of System Center. If you plan to use more hosts or virtual machines, System Center will definitely be a great solution for you. Let's take a look at the components of System Center. Migrating VMware virtual machines If you are running virtual machines on VMware ESXi hosts, there are really good options available for moving them to Hyper-V. There are different approaches on how to convert a VMware virtual machine to Hyper-V: from the inside of the VM on a guest level, running cold conversions with the VM powered off on the host level, running hot conversions on a running VM, and so on. I will give you a short overview of the currently available tools in the market. System Center VMM SCVMM should not be the first tool of your choice, take a look at MVMC combined with MAT to get an equal functionality from a better working tool. The earlier versions of SCVMM allowed online or offline conversions of VMs; the current version, 2016, allows only offline conversions of VMs. Select a powered-off VM on a VMware host or from the SCVMM library share to start the conversion. The VM conversion will convert VMware-hosted virtual machines through vCenter and ensure that the entire configuration, such as memory, virtual processor, and other machine configurations, is also migrated from the initial source. The tool also adds virtual NICs to the deployed virtual machine on Hyper-V. The VMware tools must be uninstalled before the conversion because you won't be able to remove the VMware tools when the VM is not running on a VMware host. SCVMM 2016 supports ESXi hosts running 4.1 and 5.1 but not the latest ESX Version 5.5. SCVMM conversions are great to automate through their integrated PowerShell support and it's very easy to install upgraded Hyper-V integration services as part of the setup or by adding any kind of automation through PowerShell or System Center Orchestrator. Besides manually removing VMware tools, using SCVMM is an end-to-end solution in the migration process. You can find some PowerShell examples for SCVMM-powered V2V conversion scripts at http://bit.ly/Y4bGp8. I don't recommend the use of this tool anymore because Microsoft doesn't spend time on this tool anymore. Microsoft Virtual Machine Converter Microsoft released its first version of the free solution accelerator Microsoft Virtual Machine Converter (MVMC) in 2013, and it should be available in Version 3.1 by the release of this book. MVMC provides a small and easy option to migrate selected virtual machines to Hyper-V. It takes a very similar approach to the conversion as SCVMM does. The conversion happens at a host level and offers a fully integrated end-to-end solution. MVMC supports all recent versions of VMware vSphere. It will even uninstall the VMware tools and install the Hyper-V integration services. MVMC 2.0 works with all supported Hyper-V guest operating systems, including Linux. MVMC comes with a full GUI wizard as well as a fully scriptable command-line interface (CLI). Besides being a free tool, it is fully supported by Microsoft in case you experience any problems during the migration process. MVMC should be the first tool of your choice if you do not know which tool to use. Like most other conversion tools, it does the actual conversion on the MVMC server itself and requires its disk space to host the original VMware virtual disk as well as the converted Hyper-V disk. MVMC even offers an add-on for VMware virtual center servers to start conversions directly from the vSphere console. The current release of MV is freely available at its official download site at http://bit.ly/1HbRIg7. Download MVMC to the conversion system and start the click-through setup. After finishing the download, start the MVMC with the GUI by executing Mvmc.Gui.exe. The wizard guides you through some choices. MVMC is not only capable of migrating to Hyper-V but also allows you to move virtual machines to Microsoft Azure. Follow these few steps to convert a VMware VM: Select Hyper-V as a target. Enter the name of the Hyper-V host you want this VM to run on and specify a fileshare to use and the format of the disks you want to create. Choosing the dynamically expanding disks should be the best option most of the time. Enter the name of the ESXi server you want to use as a source as well as valid credentials. Select the virtual machine to convert. Make sure it has VMware tools installed. The VM can be either powered on or off. Enter a workspace folder to store the converted disk. Wait for the process to finish. There is some additional guidance available at http://bit.ly/1vBqj0U. This is a great and easy way to migrate a single virtual machine. Repeat the steps for every other virtual machine you have, or use some automation. Upgrading single Hyper-V hosts If you are currently running a single host with an older version of Hyper-V and now want to upgrade this host on the same hardware, there is a limited set of decisions to be made. You want to upgrade the host with the least amount of downtime and without losing any data from your virtual machine. Before you start the upgrade process, make sure all components from your infrastructure are compatible with the new version of Hyper-V. Then it's time to prepare your hardware for this new version of Hyper-V by upgrading all firmware to the latest available version and downloading the necessary drivers for Windows Server 2016 with Hyper-V along with its installation media. One of the most crucial questions in this update scenario is whether you should use the integrated installation option called in-place upgrade, where the existing operating system will be transformed to the recent version of Hyper-V or delete the current operating system and perform a clean installation. While the installation experience of in-place upgrades works well when only the Hyper-V role is installed, based on experiences, some versions of upgraded systems are more likely to suffer problems. Numbers pulled from the Elanity support database show about 15 percent more support cases on upgraded systems from Windows Server 2008 R2 than clean installations. Remember how fast and easy it is nowadays to do a clean install of Hyper-V; this is why it is highly recommended to do this over upgrading existing installations. If you are currently using Windows Server 2012 R2 and want to upgrade to Windows Server 2016, note that we have not yet seen any differences in the number of support cases between the installation methods. However, clean installations of Hyper-V being so fast and easy, I barely use them. Before starting any type of upgrade scenario, make sure you have current backups of all affected virtual machines. Nonetheless, if you want to use the in-place upgrade, insert the Windows Server 2016 installation media and run this command from your current operating system: Setup.exe /auto:upgrade If it fails, it's most likely due to an incompatible application installed on the older operating system. Start the setup without the parameter to find out which applications need to be removed before executing the unattended setup. If you upgrade from Windows Server 2012 R2, there is no additional preparation needed; if you upgrade from older operating systems, make sure to remove all snapshots from your virtual machines. Importing virtual machines If you choose to do a clean installation of the operating systems, you do not necessarily have to export the virtual machines first; just make sure all VMs are powered off and are stored on a different partition than your Hyper-V host OS. If you are using a SAN, disconnect all LUNs before the installation and reconnect them afterwards to ensure their integrity through the installation process. After the installation process, just reconnect the LUNs and set the disk online in diskpart or in Disk Management at Control Panel | Computer Management. If you are using local disks, make sure not to reformat the partition with your virtual machines on it. You should export VM to another location and import them back after reformatting; more efforts are required but it is safer. Set the partition online and then reimport the virtual machines. Before you start the reimport process, make sure all dependencies of your virtual machines are available, especially vSwitches. To import a single Hyper-V VM, use the following PowerShell cmdlet: Import-VM -Path 'D:VMsVM01Virtual Machines2D5EECDA-8ECC-4FFC- ACEE-66DAB72C8754.xml' To import all virtual machines from a specific folder, use this command: Get-ChildItem d:VMs -Recurse -Filter "Virtual Machines" | %{Get- ChildItem $_.FullName -Filter *.xml} | %{import-vm $_.FullName - Register} After that, all VMs are registered and ready for use on your new Hyper-V hosts. Make sure to update the Hyper-V integration services of all virtual machines before going back into production. If you still have virtual disks in the old .vhd format, it's now time to convert them to .vhdx files. Use this PowerShell cmdlet on powered-off VMs or standalone vDisk to convert a single .vhd file: Convert-VHD –Path d:VMstestvhd.vhd –DestinationPath d:VMstestvhdx.vhdx If you want to convert the disks of all your VMs, fellow MVPs, Aidan Finn and Didier van Hoye, provided a great end-to-end solution to achieve this. This can be found at http://bit.ly/1omOagi. I often hear from customers that they don't want to upgrade their disks, so as to be able to revert to older versions of Hyper-V when needed. First, you should know that I have never met a customer who has done that because there really is no technical reason why anyone should do this. Second, even if you would do this backwards move, running virtual machines on older Hyper-V hosts is not supported, if they had been deployed on more modern versions of Hyper-V before. The reason for this is very simple; Hyper-V does not offer a way for downgrading Hyper-V integration services. The only way to move a virtual machine back to an older Hyper-V host is by restoring a backup of the VM made earlier before the upgrade process. Exporting virtual machines If you want to use another physical system running a newer version of Hyper-V, you have multiple possible options. They are as follows: When using a SAN as a shared storage, make sure all your virtual machines, including their virtual disks, are located on other LUNs rather than on the host operating system. Disconnect all LUNs hosting virtual machines from the source host and connect them to the target host. Bulk import the VMs from the specified folders. When using SMB3 shared storage from scale-out file servers, make sure to switch access to the shared hosting VMs to the new Hyper-V hosts. When using local hard drives and upgrading from Windows Server 2008 SP2 or Windows Server 2008 R2 with Hyper-V, it's necessary to export the virtual machines to a storage location reachable from the new host. Hyper-V servers running legacy versions of the OS (prior to 2012 R2) need to power off the VMs before an export can occur. To export a virtual machine from a host, use the following PowerShell cmdlet: Export-VM –Name VM –Path D: To export all virtual machines to a folder underneath the following root, use the following command: Get-VM | Export-VM –Path D: In most cases, it is also possible to just copy the virtual machine folders containing virtual hard disk's and configuration files to the target location and import them to Windows Server 2016 Hyper-V hosts. However, the export method is more reliable and should be preferred. A good alternative for moving virtual machines can be the recreation of virtual machines. If you have another host up and running with a recent version of Hyper-V, it may be a good opportunity to also upgrade some guest OSes. For instance, Windows Server 2003 and 2003 R2 are out of extended support since July 2015. Depending on your applications, it may now be the right choice to create new virtual machines with Windows Server 2016 as a guest operating system and migrate your existing workloads from older VMs to these new machines. Summary In this article, we learned about Why Hyper-V projects fail, how to migrate VMware virtual machine and also about upgrading single Hyper-V hosts. This article also covers the overview of the failover cluster and Storage Replica. Resources for Article: Further resources on this subject: Hyper-V Basics [article] The importance of Hyper-V Security [article] Hyper-V building blocks for creating your Microsoft virtualization platform [article]

0
0
3948

Packt

08 Nov 2016

13 min read

A Configuration Guide

Packt

08 Nov 2016

13 min read

In this article by Dimitri Aivaliotis, author Mastering NGINX - Second Edition, The NGINX configuration file follows a very logical format. Learning this format and how to use each section is one of the building blocks that will help you to create a configuration file by hand. Constructing a configuration involves specifying global parameters as well as directives for each individual section. These directives and how they fit into the overall configuration file is the main subject of this article. The goal is to understand how to create the right configuration file to meet your needs. (For more resources related to this topic, see here.) The basic configuration format The basic NGINX configuration file is set up in a number of sections. Each section is delineated as shown: <section> { <directive> <parameters>; } It is important to note that each directive line ends with a semicolon (;). This marks the end of line. The curly braces ({}) actually denote a new configuration context, but we will read these as sections for the most part. The NGINX global configuration parameters The global section is used to configure the parameters that affect the entire server and is an exception to the format shown in the preceding section. The global section may include configuration directives, such as user and worker_processes, as well as sections, such as events. There are no open and closing braces ({}) surrounding the global section. The most important configuration directives in the global context are shown in the following table. These configuration directives will be the ones that you will be dealing with for the most part. Global configuration directives Explanation user The user and group under which the worker processes run is configured using this parameter. If the group is omitted, a group name equal to that of the user is used. worker_processes This directive shows the number of worker processes that will be started. These processes will handle all the connections made by the clients. Choosing the right number depends on the server environment, the disk subsystem, and the network infrastructure. A good rule of thumb is to set this equal to the number of processor cores for CPU-bound loads and to multiply this number by 1.5 to 2 for the I/O bound loads. error_log This directive is where all the errors are written. If no other error_log is given in a separate context, this log file will be used for all errors, globally. A second parameter to this directive indicates the level at which (debug, info, notice, warn, error, crit, alert, and emerg) errors are written in the log. Note that the debug-level errors are only available if the --with-debug configuration switch is given at compilation time. pid This directive is the file where the process ID of the main process is written, overwriting the compiled-in default. use This directive indicates the connection processing method that should be used. This will overwrite the compiled-in default and must be contained in an events context, if used. It will not normally need to be overridden, except when the compiled-in default is found to produce errors over time. worker_connections This directive configures the maximum number of simultaneous connections that a worker process may have opened. This includes, but is not limited to, client connections and connections to upstream servers. This is especially important on reverse proxy servers—some additional tuning may be required at the operating system level in order to reach this number of simultaneous connections. Here is a small example using each of these directives: # we want nginx to run as user 'www' user www; # the load is CPU-bound and we have 12 cores worker_processes 12; # explicitly specifying the path to the mandatory error log error_log /var/log/nginx/error.log; # also explicitly specifying the path to the pid file pid /var/run/nginx.pid; # sets up a new configuration context for the 'events' module events { # we're on a Solaris-based system and have determined that nginx # will stop responding to new requests over time with the default # connection-processing mechanism, so we switch to the second-best use /dev/poll; # the product of this number and the number of worker_processes # indicates how many simultaneous connections per IP:port pair are # accepted worker_connections 2048; } This section will be placed at the top of the nginx.conf configuration file. Using the include files The include files can be used anywhere in your configuration file, to help it be more readable and to enable you to reuse parts of your configuration. To use them, make sure that the files themselves contain the syntactically correct NGINX configuration directives and blocks; then specify a path to those files: include /opt/local/etc/nginx/mime.types; A wildcard may appear in the path to match multiple files: include /opt/local/etc/nginx/vhost/*.conf; If the full path is not given, NGINX will search relative to its main configuration file. A configuration file can easily be tested by calling NGINX as follows: nginx -t -c <path-to-nginx.conf> This command will test the configuration, including all files separated out into the include files, for syntax errors. Sample configuration The following code is an example of an HTTP configuration section: http { include /opt/local/etc/nginx/mime.types; default_type application/octet-stream; sendfile on; tcp_nopush on; tcp_nodelay on; keepalive_timeout 65; server_names_hash_max_size 1024; } This context block would go after any global configuration directives in the nginx.conf file. The virtual server section Any context beginning with the keyword server is considered as a virtual server section. It describes a logical separation of a set of resources that will be delivered under a different server_name directive. These virtual servers respond to the HTTP requests, and are contained within the http section. A virtual server is defined by a combination of the listen and server_name directives. The listen directive defines an IP address/port combination or path to a UNIX-domain socket: listen address[:port]; listen port; listen unix:path; The listen directive uniquely identifies a socket binding under NGINX. There are a number of optional parameters that listen can take: The listen parameters Explanation Comments default_server This parameter defines this address/port combination as being the default value for the requests bound here. setfib This parameter sets the corresponding FIB for the listening socket. This parameter is only supported on FreeBSD and not for UNIX-domain sockets. backlog This parameter sets the backlog parameter in the listen() call. This parameter defaults to -1 on FreeBSD and 511 on all other platforms. rcvbuf This parameter sets the SO_RCVBUF parameter on the listening socket. sndbuf This parameter sets the SO_SNDBUF parameter on the listening socket. accept_filter This parameter sets the name of the accept filter to either dataready or httpready. This parameter is only supported on FreeBSD. deferred This parameter sets the TCP_DEFER_ACCEPT option to use a deferred accept() call. This parameter is only supported on Linux. bind This parameter makes a separate bind() call for this address/port pair. A separate bind() call will be made implicitly if any of the other socket-specific parameters are used. ipv6only This parameter sets the value of the IPV6_ONLY parameter. This parameter can only be set on a fresh start and not for UNIX-domain sockets. ssl This parameter indicates that only the HTTPS connections will be made on this port. This parameter allows for a more compact configuration. so_keepalive This parameter configures the TCP keepalive connection for the listening socket. The server_name directive is fairly straightforward and it can be used to solve a number of configuration problems. Its default value is "", which means that a server section without a server_name directive will match a request that has no Host header field set. This can be used, for example, to drop requests that lack this header: server { listen 80; return 444; } The nonstandard HTTP code, 444, used in this example will cause NGINX to immediately close the connection. Besides a normal string, NGINX will accept a wildcard as a parameter to the server_name directive: The wildcard can replace the subdomain part: *.example.com The wildcard can replace the top-level domain part: www.example.* A special form will match the subdomain or the domain itself: .example.com (matches *.example.com as well as example.com) A regular expression can also be used as a parameter to server_name by prepending the name with a tilde (~): server_name ~^www.example.com$; server_name ~^www(d+).example.(com)$; The latter form is an example using captures, which can later be referenced (as $1, $2, and so on) in further configuration directives. NGINX uses the following logic when determining which virtual server should serve a specific request: Match the IP address and port to the listen directive. Match the Host header field against the server_name directive as a string. Match the Host header field against the server_name directive with a wildcard at the beginning of the string. Match the Host header field against the server_name directive with a wildcard at the end of the string. Match the Host header field against the server_name directive as a regular expression. If all the Host headers match fail, direct to the listen directive marked as default_server. If all the Host headers match fail and there is no default_server, direct to the first server with a listen directive that satisfies step 1. This logic is expressed in the following flowchart: The default_server parameter can be used to handle requests that would otherwise go unhandled. It is therefore recommended to always set default_server explicitly so that these unhandled requests will be handled in a defined manner. Besides this usage, default_server may also be helpful in configuring a number of virtual servers with the same listen directive. Any directives set here will be the same for all matching server blocks. Locations – where, when, and how The location directive may be used within a virtual server section and indicates a URI that comes either from the client or from an internal redirect. Locations may be nested with a few exceptions. They are used for processing requests with as specific configuration as possible. A location is defined as follows: location [modifier] uri {...} Or it can be defined for a named location: location @name {…} A named location is only reachable from an internal redirect. It preserves the URI as it was before entering the location block. It may only be defined at the server context level. The modifiers affect the processing of a location in the following way: Location modifiers Handling = This modifier uses exact match and terminate search. ~ This modifier uses case-sensitive regular expression matching. ~* This modifier uses case-insensitive regular expression matching. ^~ This modifier stops processing before regular expressions are checked for a match of this location's string, if it's the most specific match. Note that this is not a regular expression match—its purpose is to preempt regular expression matching. When a request comes in, the URI is checked against the most specific location as follows: Locations without a regular expression are searched for the most-specific match, independent of the order in which they are defined. Regular expressions are matched in the order in which they are found in the configuration file. The regular expression search is terminated on the first match. The most-specific location match is then used for request processing. The comparison match described here is against decoded URIs; for example, a %20 instance in a URI will match against a "" (space) specified in a location. A named location may only be used by internally redirected requests. The following directives are found only within a location: Location-only directives Explanation alias This directive defines another name for the location, as found on the filesystem. If the location is specified with a regular expression, alias should reference captures defined in that regular expression. The alias directive replaces the part of the URI matched by the location such that the rest of the URI not matched will be searched for in that filesystem location. Using the alias directive is fragile when moving bits of the configuration around, so using the root directive is preferred, unless the URI needs to be modified in order to find the file. internal This directive specifies a location that can only be used for internal requests (redirects defined in other directives, rewrite requests, error pages, and so on.) limit_except This directive limits a location to the specified HTTP verb(s) (GET also includes HEAD). Additionally, a number of directives found in the http section may also be specified in a location. Refer to Appendix A, Directive Reference, for a complete list. The try_files directive deserves special mention here. It may also be used in a server context, but will most often be found in a location. The try_files directive will do just that—try files in the order given as parameters; the first match wins. It is often used to match potential files from a variable and then pass processing to a named location, as shown in the following example: location / { try_files $uri $uri/ @mongrel; } location @mongrel { proxy_pass http://appserver; } Here, an implicit directory index is tried if the given URI is not found as a file and then processing is passed on to appserver via a proxy. We will explore how best to use location, try_files, and proxy_pass to solve specific problems throughout the rest of the article. Locations may be nested except in the following situations: When the prefix is = When the location is a named location Best practice dictates that regular expression locations be nested inside the string-based locations. An example of this is as follows: # first, we enter through the root location / { # then we find a most-specific substring # note that this is not a regular expression location ^~ /css { # here is the regular expression that then gets matched location ~* /css/.*.css$ { } } } Summary In this article, we saw how the NGINX configuration file is built. Its modular nature is a reflection, in part, of the modularity of NGINX itself. A global configuration block is responsible for all aspects that affect the running of NGINX as a whole. There is a separate configuration section for each protocol that NGINX is responsible for handling. We may further define how each request is to be handled by specifying servers within those protocol configuration contexts (either http or mail) so that requests are routed to a specific IP address/port. Within the http context, locations are then used to match the URI of the request. These locations may be nested, or otherwise ordered to ensure that requests get routed to the right areas of the filesystem or application server. Resources for Article: Further resources on this subject: Zabbix Configuration [article] Configuring the essential networking services provided by pfSense [article] Configuring a MySQL linked server on SQL Server 2008 [article]

0
0
2331

How-To Tutorials - Virtualization

New for 2020 in operations and infrastructure engineering

Chaos engineering comes to Kubernetes thanks to Gremlin

6 signs you need containers

It's Black Friday: But what's the business (and developer) cost of downtime?

Chaos Conf 2018 Recap: Chaos engineering hits maturity as community moves towards controlled experimentation

Log monitoring tools for continuous security monitoring policy [Tutorial]

Interactive dashboard with vRealize Operations Manager [Tutorial]

Troubleshooting techniques in vRealize Operations components

How to ace managing the Endpoint Operations Management Agent with vROps

VMware vSphere storage, datastores, snapshots

Trending Topics

KVM Networking with libvirt

Introduction to SDN - Transformation from legacy to SDN

Learning Basic PowerCLI Concepts

Storage Practices and Migration to Hyper-V 2016

A Configuration Guide