Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon

How-To Tutorials - Virtualization

115 Articles
article-image-new-for-2020-in-operations-and-infrastructure-engineering
Richard Gall
19 Dec 2019
5 min read
Save for later

New for 2020 in operations and infrastructure engineering

Richard Gall
19 Dec 2019
5 min read
It’s an exciting time if you work in operations and software infrastructure. Indeed, you could even say that as the pace of change and innovation increases, your role only becomes more important. Operations and systems engineers, solution architects, everyone - you’re jobs are all about bringing stability, order and control into what can sometimes feel like chaos. As anyone that’s been working in the industry knows, managing change, from a personal perspective, requires a lot of effort. To keep on top of what’s happening in the industry - what tools are being released and updated, what approaches are gaining traction - you need to have one eye on the future and the wider industry. To help you with that challenge and get you ready for 2020, we’ve put together a list of what’s new for 2020 - and what you should start learning. Learn how to make Kubernetes work for you It goes without saying that Kubernetes was huge in 2019. But there are plenty of murmurs and grumblings that it’s too complicated and adds an additional burden for engineering and operations teams. To a certain extent there’s some truth in this - and arguably now would be a good time to accept that just because it seems like everyone is using Kubernetes, it doesn’t mean it’s the right solution for you. However, having said that, 2020 will be all about understanding how to make Kubernetes relevant to you. This doesn’t mean you should just drop the way you work and start using Kubernetes, but it does mean that spending some time with the platform and getting a better sense of how it could be used in the future is a useful way to spend your learning time in 2020. Explore Packt's extensive range of Kubernetes eBooks and videos on the Packt store. Learn how to architect If software has eaten the world, then by the same token perhaps complexity has well and truly eaten software as we know it. Indeed, Kubernetes is arguably just one of the symptoms and causes of this complexity. Another is the growing demand for architects in engineering and IT teams. There are a number of different ‘architecture’ job roles circulating across the industry, from solutions architect to application architect. While they each have their own subtle differences, and will even vary from company to company, they’re all roles that are about organizing and managing different pieces into something that is both stable and value-driving. Cloud has been particularly instrumental in making architect roles more prominent in the industry. As organizations look to resist the pitfalls of lock-in and better manage resources (financial and otherwise), it will be down to architects to balance business and technology concerns carefully. Learn how to architect cloud native applications. Read Architecting Cloud Computing Solutions. Get to grips with everything you need to know to be a software architect. Pick up Software Architect's Handbook. Artificial intelligence It’s strange that the hype around AI doesn’t seem to have reached the world of ops. Perhaps this is because the area is more resistant to the spin that comes with AI, preferring instead to focus more on the technical capabilities of tools and platforms. Whatever the case, it’s nevertheless true that AI will play an important part in how we manage and secure infrastructure. From monitoring system health, to automating infrastructure deployments and configuration, and even identifying security threats, artificial intelligence is already an important component for operations engineers and others. Indeed, artificial intelligence is being embedded inside products and platforms that ops teams are using - this means the need to ‘learn’ artificial intelligence is somewhat reduced. But it would be wrong to think it’s something that can just be managed from a dashboard. In 2020 it will be essential to better understand where and how artificial intelligence can fit into your operations and architectural toolchain. Find artificial intelligence eBooks and videos in Packt's collection of curated data science bundles. Observability, monitoring, tracing, and logging One of the challenges of software complexity is understanding exactly what’s going on under the hood. Yes, the network might be unreliable, as the saying goes, but what makes things even worse is that we’re not even sure why. This is where observability and the next generation of monitoring, logging and tracing all come into play. Having detailed insights into how applications and infrastructures are performing, how resources are being managed, and what things are actually causing problems is vitally important from a team perspective. Without the ability to understand these things, it can put pressure on teams as knowledge becomes siloed inside the brains of specific engineers. It makes you vulnerable to failure as you start to have points of failure at a personnel level. There are, of course, a wide range of tools and products available that can make monitoring and tracing easy (or easier, at least). But understanding which ones are right for your needs still requires some time learning and exploring the options out there. Make sure you do exactly that in 2020. Learn how to monitor distributed systems with Learn Centralized Logging and Monitoring with Kubernetes. Making serverless a reality We’ve talked about serverless a lot this year. But as a concept there’s still considerable confusion about what role it should play in modern DevOps processes. Indeed, even the nomenclature is a little confusing. Platforms using their own terminology, such as ‘lambdas’ and ‘functions’, only adds to the sense that serverless is something amorphous and hard to pin down. So, in 2020, we need to work out how to make serverless work for us. Just as we need to consider how Kubernetes might be relevant to our needs, we need to consider in what ways serverless represents both a technical and business opportunity. Search Packt's library for the latest serverless eBooks and videos. Explore more technology eBooks and videos on the Packt store.
Read more
  • 0
  • 0
  • 4617

article-image-chaos-engineering-comes-to-kubernetes-thanks-to-gremlin
Richard Gall
18 Nov 2019
2 min read
Save for later

Chaos engineering comes to Kubernetes thanks to Gremlin

Richard Gall
18 Nov 2019
2 min read
Kubernetes causes problems. Just last week Cindy Sridharan wrote on Twitter that while Docker "succeeded... because it was a great developer tool," Kubernetes "decided to be all things tech and not much by way of UX. It was and remains a hostile piece of software to learn, run, operate, maintain." https://twitter.com/copyconstruct/status/1194701905248673792?s=20 That's just a drop in the ocean - you don't have to look hard to find more hot takes, jokes, and memes about how complicated working with Kubernetes can feel. Despite all this, it's certainly here to stay. That makes chaos engineering platform Gremlin's announcement that the platform will offer native support for Kubernetes particularly welcome. Citing container orchestration research done by Datadog in the press release, which indicates the rapid rate of Kubernetes adoption, Gremlin is hoping that it can provide some additional support for users that might be concerned about the platforms complexity. From last year: Gremlin makes chaos engineering with Docker easier with new container discovery feature Gremlin CTO Matt Fornaciari said "our goal is to provide SRE and DevOps teams that are building and deploying modern applications with the tools and processes necessary to understand how their systems handle failure, before that failure has the chance to impact customers and business." The new feature is designed to help engineers do exactly that by allowing them "to automate the process of identifying Kubernetes primitives such as nodes and pods," and to select and attack traffic from different services within Kubernetes. The other important element to all this is that Gremlin wants to make things as straightforward as possible for engineering teams. With a neat and easy to use UI, it would seem that, to return to Sridharan's words, the team are eager to make sure their product is "a great developer tool."   The tool has already been tried and tested in the wild. Simon Govier, Expedia's Director of Program Management described how performing chaos experiments on Kubernetes with Gremlin "significantly reduces the amount of time it takes to do fault injection and increases our systems' resilience to failure." Learn more on the Gremlin website.
Read more
  • 0
  • 0
  • 4054

article-image-6-signs-you-need-containers
Richard Gall
05 Feb 2019
9 min read
Save for later

6 signs you need containers

Richard Gall
05 Feb 2019
9 min read
I’m not about to tell you containers is a hot new trend - clearly, it isn’t. Today, they are an important part of the mainstream software development industry that probably won't be disappearing any time soon. But while containers certainly can’t be described as a niche or marginal way of deploying applications, they aren’t necessarily ubiquitous. There are still developers or development teams yet to fully appreciate the usefulness of containers. You might know them - you might even be one of them. Joking aside, there are often many reasons why people aren’t using containers. Sometimes these are good reasons: maybe you just don’t need them. Often, however, you do need them, but the mere thought of changing your systems and workflow can feel like more trouble than it’s worth. If everything seems to be (just about) working, why shake things up? Well, I’m here to tell you that more often than not it is worthwhile. But to know that you’re not wasting your time and energy, there are a few important signs that can tell you if you should be using containers. Download Containerize Your Apps with Docker and Kubernetes for free, courtesy of Microsoft.  Your codebase is too complex There are few developers in the world who would tell you that their codebase couldn’t do with a little pruning and simplification. But if your code has grown into a beast that everyone fears and doesn’t really understand, containers could probably help you a lot. Why do containers help simplify your codebase? Let’s think about how spaghetti code actually happens. Yes, it always happens by accident, but usually it’s something that evolves out of years of solving intractable problems with knock on effects and consequences that only need to be solved later. By using containers you can begin to think differently about your code. Instead of everything being tied up together, like a complex concrete network of road junctions, containers allow you to isolate specific parts of it. When you can better isolate your code, you can also isolate different problems and domains. This is one of the reasons that containers is so closely aligned with microservices. Software testing is nightmarish The efficiency benefits of containers are well documented, but the way containers can help the software testing process is often underplayed - this probably says more about a general inability to treat testing with the respect and time it deserves as much as anything else. How do containers make testing easier? There are a number of reasons containers make software testing easier. On the one hand, by using containers you’re reducing that gap between the development environment and production, which means you shouldn’t be faced with as many surprises once your code hits production as you sometimes might. Containers also make the testing process faster - you only need to test against a container image, you don’t need a fully-fledged testing environment for every application you do tests on. What this all boils down to is that testing becomes much quicker and easier. In theory, then, this means the testing process fits much more neatly within the development workflow. Code quality should never be seen as a bottleneck; with containers it becomes much easier to embed the principle in your workflow. Read next: How to build 12 factor microservices on Docker Your software isn’t secure - you’ve had breaches that could have been prevented Spaghetti code, lack of effective testing can lead to major security risks. If no one really knows what’s going on inside your applications and inside your code it’s inevitable that you’ll have vulnerabilities. And, in turn, it’s highly likely these vulnerabilities will be exploited. How can containers make software security easier? Because containers allow you to make changes to parts of your software infrastructure (rather than requiring wholesale changes), this makes security patches much easier to achieve. Essentially, you can isolate the problem and tackle it. Without containers, it becomes harder to isolate specific pieces of your infrastructure, which means any changes could have a knock on effect on other parts of your code that you can’t predict. That all being said, it probably is worth mentioning that containers do still pose a significant set of security challenges. While simplicity in your codebase can make testing easier, you are replacing simplicity at that level with increased architectural complexity. To really feel the benefits of container security, you need a strong sense of how your container deployments are working together and how they might interact. Your software infrastructure is expensive (you feel the despair of vendor lock-in) Running multiple virtual machines can quickly get expensive. In terms of both storage and memory, if you want to scale up, you’re going to be running through resources at a rapid rate. While you might end up spending big on more traditional compute resources, the tools around container management and automation are getting cheaper. One of the costs of many organization’s software infrastructure is lock-in. This isn’t just about price, it’s about the restrictions that come with sticking with a certain software vendor - you’re spending money on software systems that are almost literally restricting your capacity for growth and change. How do containers solve the software infrastructure problem and reduce vendor lock-in? Traditional software infrastructure - whether that’s on-premise servers or virtual ones - is a fixed cost - you invest in the resources you need, and then either use it or you don’t. With containers running on, say, cloud, it becomes a lot easier to manage your software spend alongside strategic decisions about scalability. Fundamentally, it means you can avoid vendor lock-in. Yes, you might still be paying a lot of money for AWS or Azure, but because containers are much more portable, moving your applications between providers is much less hassle and risk. Read next: CNCF releases 9 security best practices for Kubernetes, to protect a customer’s infrastructure DevOps is a war, not a way of working Like containers, DevOps could hardly be considered a hot new trend any more. But this doesn’t mean it’s now part of the norm. There are plenty of organizations that simply don’t get DevOps, or, at the very least, seem to be stumbling their way through sprint meetings with little real alignment between development and operations. There could be multiple causes for this conflict (maybe people just don’t get on), but DevOps often fails where the code that’s being written and deployed is too complicated for anyone to properly take accountability. This takes us back to the issue of the complex codebase. Think of it this way - if code is a gigantic behemoth that can’t be easily broken up, the unintended effects and consequences of every new release and update can cause some big problems - both personally and technically. How do containers solve DevOps challenges? Containers can help solve the problems that DevOps aims to tackle by breaking up software into different pieces. This means that developers and operations teams have much more clarity on what code is being written and why, as well as what it should do. Indeed, containers arguably facilitate DevOps practices much more effectively than DevOps proponents have been trying to do in pre-container years. Adding new product features is a pain The issue of adding features or improving applications is a complaint that reaches far beyond the development team. Product management, marketing - these departments will all bemoan the ability to make necessary changes or add new features that they will argue is business critical. Often, developers will take the heat. But traditional monolithic applications make life difficult for developers - you simply can’t make changes or updates. It’s like wanting to replace a radiator and having to redo your house’s plumbing. This actually returns us to the earlier point about DevOps - containers makes DevOps easier because it enables faster delivery cycles. You can make changes to an application at the level of a container or set of containers. Indeed, you might even simply kill one container and replace it with a new one. In turn, this means you can change and build things much more quickly. How do containers make it easier to update or build new features? To continue with the radiator analogy: containers would allow you to replace or change an individual radiator without having to gut your home. Essentially, if you want to add a new feature or change an element, you wouldn’t need to go into your application and make wholesale changes - that may have unintended consequences - instead, you can simply make a change by running the resources you need inside a new container (or set of containers). Watch for the warning signs As with any technology decision, it’s well worth paying careful attention to your own needs and demands. So, before fully committing to containers, or containerizing an application, keep a close eye on the signs that they could be a valuable option. Containers may well force you to come face to face with the reality of technical debt - and if it does, so be it. There’s no time like the present, after all. Of course, all of the problems listed above are ultimately symptoms of broader issues or challenges you face as a development team or wider organization. Containers shouldn’t be seen as a sure-fire corrective, but they can be an important element in changing your culture and processes. Learn how to containerize your apps with a new eBook, free courtesy of Microsoft. Download it here.
Read more
  • 0
  • 0
  • 5503
Banner background image

article-image-its-black-friday-but-whats-the-business-and-developer-cost-of-downtime
Richard Gall
23 Nov 2018
4 min read
Save for later

It's Black Friday: But what's the business (and developer) cost of downtime?

Richard Gall
23 Nov 2018
4 min read
Black Friday is back, and, as you've probably already noticed, with a considerable vengeance. According to Adobe Analytics data, online spending is predicted to hit $3.7 billion over this holiday season in the U.S, up from $2.9 billion in 2017. But while consumers clamour for deals and businesses reap the rewards, it's important to remember there's a largely hidden plane of software engineering labour. Without this army of developers, consumers will most likely be hitting their devices in frustration, while business leaders will be missing tough revenue targets - so, as we enter into Black Friday let's pour one out for all those engineers on call and trying their best to keep eCommerce sites on their feet. Here's to the software engineers keeping things running on Black Friday Of course, the pain that hits on days like Black Friday and Cyber Monday can be minimised with smart planning and effective decision making long before those sales begin. However, for engineering teams under-resourced and lacking the right tools, that is simply impossible. This means that software engineers are left in a position where they're treading water, knowing that they're going to be sinking once those big days come around. It doesn't have to be like this. With smarter leadership and, indeed, more respect for the intensive work engineers put in to make websites and apps actually work, revenue driving platforms can become more secure, resilient and stable. Chaos engineering platform Gremlin publishes the 'true cost of downtime' This is the central argument of chaos engineering platform Gremlin, who we've covered a number of times this year. To coincide with Black Friday the team has put together what they believe is the 'true cost of downtime'. On the one hand this is a good marketing hook for their chaos engineering platform, but, cynicism aside, it's also a good explanation of why the principles of chaos engineering can be so valuable from both a business and developer perspective. Estimating the annual revenue of some of the biggest companies in the world, Gremlin has been then created an interactive table to demonstrate what the cost of downtime for each of those businesses would be, for the length of time you are on the page. For 20 minutes downtime, Amazon.com would have lost a staggering $4.4 million. For Walgreens it's more than $80,000. Gremlin provide some context to all this, saying: "Enterprise commerce businesses typically rely on a complex microservices architecture, from fulfillment, to website security, ability to scale with holiday traffic, and payment processing - there is a lot that can go wrong and impact revenue, damage customer trust, and consume engineering time. If an ecommerce site isn’t 100% online and performant, it’s losing revenue." "The holiday season is especially demanding for SREs working in ecommerce. Even the most skilled engineering teams can struggle to keep up with the demands of peak holiday traffic (i.e. Black Friday and Cyber Monday). Just going down for a few seconds can mean thousands in lost revenue, but for some sites, downtime can be exponentially more expensive." For Gremlin, chaos engineering is clearly the answer to many of the problems days like Black Friday poses. While it might not work for every single organization, it's nevertheless true that failing to pay attention to the value of your applications and websites at an hour by hour level could be incredibly damaging. With outages on Facebook, WhatsApp, and Instagram happening earlier this week, these problems aren't hidden away - they're in full view of the public. What does remain hidden, however, is the work and stress that goes in to tackling these issues and ensuring things are working as they should be. Perhaps it's time to start learning the lessons of Black Friday - business revenues will be that little bit healthier, but engineers will also be that little bit happier. 
Read more
  • 0
  • 0
  • 4972

article-image-chaos-conf-2018-recap-chaos-engineering-hits-maturity-as-community-moves-towards-controlled-experimentation
Richard Gall
12 Oct 2018
11 min read
Save for later

Chaos Conf 2018 Recap: Chaos engineering hits maturity as community moves towards controlled experimentation

Richard Gall
12 Oct 2018
11 min read
Conferences can sometimes be confusing. Even at the most professional and well-planned conferences, you sometimes just take a minute and think what's actually the point of this? Am I learning anything? Am I meant to be networking? Will anyone notice if I steal extra food for the journey home? Chaos Conf 2018 was different, however. It had a clear purpose: to take the first step in properly forging a chaos engineering community. After almost a decade somewhat hidden in the corners of particularly innovative teams at Netflix and Amazon, chaos engineering might feel that its time has come. As software infrastructure becomes more complex, less monolithic, and as business and consumer demands expect more of the software systems that have become integral to the very functioning of life, resiliency has never been more important but more challenging to achieve. But while it feels like the right time for chaos engineering, it hasn't quite established itself in the mainstream. This is something the conference host, Gremlin, a platform that offers chaos engineering as a service, is acutely aware of. On the hand it's actively helping push chaos engineering into the hands of businesses, but on the other its growth and success, backed by millions of VC cash (and faith), depends upon chaos engineering becoming a mainstream discipline in the DevOps and SRE worlds. It's perhaps this reason that the conference felt so important. It was, according to Gremlin, the first ever public chaos engineering conference. And while it was relatively small in the grand scheme of many of today's festival-esque conferences attended by thousands of delegates (Dreamforce, the Salesforce conference, was also running in San Francisco in the same week), the fact that the conference had quickly sold out all 350 of its tickets - with more hoping on waiting lists - indicates that this was an event that had been eagerly awaited. And with some big names from the industry - notably Adrian Cockcroft from AWS and Jessie Frazelle from Microsoft - Chaos Conf had the air of an event that had outgrown its insider status before it had even began. The renovated cinema and bar in San Francisco's Mission District, complete with pinball machines upstairs, was the perfect container for a passionate community that had grown out of the clean corporate environs of Silicon Valley to embrace the chaotic mess that resembles modern software engineering. Kolton Andrus sets out a vision for the future of Gremlin and chaos engineering Chaos Conf was quick to deliver big news. They keynote speech, by Gremlin co-founder Kolton Andrus launched Gremlin's brand new Application Level Fault Injection (ALFI) feature, which makes it possible to run chaos experiments at an application level. Andrus broke the news by building towards it with a story of the progression of chaos engineering. Starting with Chaos Monkey, the tool first developed by Netflix, and moving from infrastructure to network, he showed how, as chaos engineering has evolved, it requires and faciliates different levels of control and insight on how your software works. "As we're maturing, the host level failures and the network level failures are necessary to building a robust and resilient system, but not sufficient. We need more - we need a finer granularity," Andrus explains. This is where ALFI comes in. By allowing Gremlin users to inject failure at an application level, it allows them to have more control over the 'blast radius' of their chaos experiments. The narrative Andrus was setting was clear, and would ultimately inform the ethos of the day - chaos engineering isn't just about chaos, it's about controlled experimentation to ensure resiliency. To do that requires a level of intelligence - technical and organizational - about how the various components of your software work, and how humans interact with them. Adrian Cockcroft on the importance of historical context and domain knowledge Adrian Cockcroft (@adrianco) VP at AWS followed Andrus' talk. In it he took the opportunity to set the broader context of chaos engineering, highlighting how tackling system failures is often a question of culture - how we approach system failure and think about our software. Developers love to learn things from first principles" he said. "But some historical context and domain knowledge can help illuminate the path and obstacles." If this sounds like Cockcroft was about to stray into theoretical territory, he certainly didn't. He offered plenty of practical frameworks for thinking through potential failure. But the talk wasn't theoretical - Cockcroft offered a taxonomy of failure that provides a useful framework for thinking through potential failure at every level. He also touched on how he sees the future of resiliency evolving, focusing on: Observability of systems Epidemic failure modes Automation and continuous chaos The crucial point Cockcroft makes is that cloud is the big driver for chaos engineering. "As datacenters migrate to the cloud, fragile and manual disaster recovery will be replaced by chaos engineering" read one of his slides. But more than that, the cloud also paves the way for the future of the discipline, one where 'chaos' is simply an automated part of the test and deployment pipeline. Selling chaos engineering to your boss Kriss Rochefolle, DevOps engineer and author of one of the best selling DevOps books in French, delivered a short talk on how engineers can sell chaos to their boss. He takes on the assumption that a rational proposal, informed by ROI is the best way to sell chaos engineering. He suggests instead that engineers need to play into emotions, and presenting chaos engineer as a method for tackling and minimizing the fear of (inevitable failure. Follow Kriss on Twitter: @crochefolle Walmart and chaos engineering Vilas Veraraghavan, the Director of Engineering was keen to clarify that Walmart doesn't practice chaos. Rather it practices resiliency - chaos engineering is simply a method the organization uses to achieve that. It was particularly useful to note the entire process that Vilas' team adopts when it comes to resiliency, which has largely developed out of Vilas' own work building his team from scratch. You can learn more about how Walmart is using chaos engineering for software resiliency in this post. Twitter's Ronnie Chen on diving and planning for failure Ronnie Chen (@rondoftw) is an engineering manager at Twitter. But she didn't talk about Twitter. In fact, she didn't even talk about engineering. Instead she spoke about her experience as a technical diver. By talking about her experiences, Ronnie was able to make a number of vital points about how to manage and tackle failure as a team. With mortality rates so high in diving, it's a good example of the relationship between complexity and risk. Chen made the point that things don't fail because of a single catalyst. Instead, failures - particularly fatal ones - happen because of a 'failure cascade'. Chen never makes the link explicit, but the comparison is clear - the ultimate outcome (ie. success or failure) is impacted by a whole range of situational and behavioral factors that we can't afford to ignore. Chen also made the point that, in diving, inexperienced people should be at the front of an expedition. "If you're inexperienced people are leading, they're learning and growing, and being able to operate with a safety net... when you do this, all kinds of hidden dependencies reveal themselves... every undocumented assumption, every piece of ancient team lore that you didn't even know you were relying on, comes to light." Charity Majors on the importance of observability Charity Majors (@mipsytipsy), CEO of Honeycomb, talked in detail about the key differences between monitoring and observability. As with other talks, context was important: a world where architectural complexity has grown rapidly in the space of a decade. Majors made the point that this increase in complexity has taken us from having known unknowns in our architectures, to many more unknown unknowns in a distributed system. This means that monitoring is dead - it simply isn't sophisticated enough to deal with the complexities and dependencies within a distributed system. Observability, meanwhile, allows you to to understand "what's happening in your systems just by observing it from the outside." Put simply, it lets you understand how your software is functioning from your perspective - almost turning it inside out. Majors then linked the concept to observability to the broader philosophy of chaos engineering - echoing some of the points raised by Adrian Cockcroft in his keynote. But this was her key takeaway: "Software engineers spend too much time looking at code in elaborately falsified environments, and not enough time observing it in the real world." This leads to one conclusion - the importance of testing in production. "Accept no substitute." Tammy Butow and Ana Medina on making an impact Tammy Butow (@tammybutow) and Ana Medina  (@Ana_M_Medina) from Gremlin took us through how to put chaos engineering into practice - from integrating it into your organizational culture to some practical tests you can run. One of the best examples of putting chaos into practice is Gremlin's concept of 'Failure Fridays', in which chaos testing becomes a valuable step in the product development process, to dogfood it and test out how a customer experiences it. Another way which Tammy and Ana suggested chaos engineering can be used was as a way of testing out new versions of technologies before you properly upgrade in production. To end, their talk, they demo'd a chaos battle between EKS (Kubernetes on AWS) and AKS (Kubernetes on Azure), doing an app container attack, a packet loss attack and a region failover attack. Jessie Frazelle on how containers can empower experimentation Jessie Frazelle (@jessfraz) didn't actually talk that much about chaos engineering. However, like Ronnie Chen's talk, chaos engineering seeped through what she said about bugs and containers. Bugs, for Frazelle, are a way of exploring how things work, and how different parts of a software infrastructure interact with each other: "Bugs are like my favorite thing... some people really hate when they get one of those bugs that turns out to be a rabbit hole and your kind of debugging it until the end of time... while debugging those bugs I hate them but afterwards, I'm like, that was crazy!" This was essentially an endorsement of the core concept of chaos engineering - injecting bugs into your software to understand how it reacts. Jessie then went on to talk about containers, joking that they're NOT REAL. This is because they're made up of  numerous different component parts, like Cgroups, namespaces, and LSMs. She contrasted containers with Virtual machines, zones and jails, which are 'first class concepts' - in other worlds, real things (Jessie wrote about this in detail last year in this blog post). In practice what this means is that whereas containers are like Lego pieces, VMs, zones, and jails are like a pre-assembled lego set that you don't need to play around with in the same way. From this perspective, it's easy to see how containers are relevant to chaos engineering - they empower a level of experimentation that you simply don't have with other virtualization technologies. "The box says to build the death star. But you can build whatever you want." The chaos ends... Chaos Conf was undoubtedly a huge success, and a lot of credit has to go to Gremlin for organizing the conference. It's clear that the team care a lot about the chaos engineering community and want it to expand in a way that transcends the success of the Gremlin platform. While chaos engineering might not feel relevant to a lot of people at the moment, it's only a matter of time that it's impact will be felt. That doesn't mean that everyone will suddenly become a chaos engineer by July 2019, but the cultural ripples will likely be felt across the software engineering landscape. But without Chaos Conf, it would be difficult to see chaos engineering growing as a discipline or set of practices. By sharing ideas and learning how other people work, a more coherent picture of chaos engineering started to emerge, one that can quickly make an impact in ways people wouldn't have expected six months ago. You can watch videos of all the talks from Chaos Conf 2018 on YouTube.
Read more
  • 0
  • 0
  • 5516

article-image-log-monitoring-tools-for-continuous-security-monitoring-policy-tutorial
Fatema Patrawala
13 Jul 2018
11 min read
Save for later

Log monitoring tools for continuous security monitoring policy [Tutorial]

Fatema Patrawala
13 Jul 2018
11 min read
How many times we have heard of organization's entire database being breached and downloaded by the hackers. The irony is, they are not even aware about anything until the hacker is selling the database details on the dark web after few months. Even though they implement decent security controls, what they lack is continuous security monitoring policy. It is one of the most common things that you might find in a startup or mid-sized organization. In this article, we will show how to choose the right log monitoring tool to implement continuous security monitoring policy. You are reading an excerpt from the book Enterprise Cloud Security and Governance, written by Zeal Vora. Log monitoring is a must in security Log monitoring is considered to be part of the de facto list of things that need to be implemented in an organization. It gives us the power of visibility of various events through a single central solution so we don't have to end up doing less or tail on every log file of every server. In the following screenshot, we have performed a new search with the keyword not authorized to perform and the log monitoring solution has shown us such events in a nice graphical way along with the actual logs, which span across days: Thus, if we want to see how many permission denied events occurred last week on Wednesday, this will be a 2-minute job if we have a central log monitoring solution with search functionality. This makes life much easier and would allow us to detect anomalies and attacks in a much faster than traditional approach. Choosing the right log monitoring tool This is a very important decision that needs to be taken by the organization. There are both commercial offerings as well as open source offerings that are available today but the amount of efforts that need to be taken in each of them varies a lot. I have seen many commercial offerings such as Splunk and ArcSight being used in large enterprises, including national level banks. On the contrary, there are also open source offerings, such as ELK Stack, that are gaining popularity especially after Filebeat got introduced. At a personal level, I really like Splunk but it gets very expensive when you have a lot of data being generated. This is one of the reasons why many startups or mid-sized organizations use commercial offering along with open source offerings such as ELK Stack. Having said that, we need to understand that if you decide to go with ELK Stack and have a large amount of data, then ideally you would need a dedicated person to manage it. Just to mention, AWS also has a basic level of log monitoring capability available with the help of CloudWatch. Let's get started with logging and monitoring There will always be many sources from which we need to monitor logs. Since it will be difficult to cover each and every individual source, we will talk about two primary ones, which we will be discussing sequentially: VPC flow logs AWS Config VPC flow logs VPC flow logs is a feature that allows us to capture information related to IP traffic that goes to and from the network interfaces within the VPC. VPC flow logs help in both troubleshooting related to why certain traffic is not reaching the EC2 instances and also understanding what the traffic is that is accepted and rejected. The VPC flow logs can be part of individual network interface level of an EC2 instance. This allows us to monitor how many packets are accepted or rejected in a specific EC2 instance running in the DMZ maybe. By default, the VPC flow logs are not enabled, so we will go ahead and enable the VPC flow log within our VPC: Enabling flow logs for VPC: In our environment, we have two VPCs named Development and Production. In this case, we will enable the VPC flow logs for development VPC: In order to do that, click on the Development VPC and select the Flow Logs tab. This will give you a button named Create Flow Log. Click on it and we can go ahead with the configuration procedure: Since the VPC flow logs data will be sent to CloudWatch, we need to select the IAM Role that gives these permissions: Before we go ahead in creating our first flow log, we need to create the CloudWatch log group as well where the VPC flow logs data will go into. In order to do it, go to CloudWatch, select the Logs tab. Name the log group according to what you need and click on Create log group: Once we have created our log group, we can fill the Destination Log Group field with our log group name and click on the Create Flow Log button: Once created, you will see the new flow log details under the VPC subtab: Create a test setup to check the flow: In order to test if everything is working as intended, we will start our test OpenVPN instance and in the security group section, allow inbound connections on port 443 and icmp (ping). This gives us the perfect base for a plethora of attackers detecting our instance and running a plethora of attacks on our server: Analyze flow logs in CloudWatch: Before analyzing for flow logs, I went for a small walk so that we can get a decent number of logs when we examine; thus, when I returned, I began analyzing the flow logs data. If we observe the flow log data, we see plenty of packets, which have REJECT OK at the end as well as ACCEPT OK. Flow logs can be unto specific interface levels, which are attached to EC2 instances. So, in order to check the flow logs, we need to go to CloudWatch, select the Log Groups tab, inside it select the log group that we created and then select the interface. In our case, we selected the interface related to the OpenVPN instance, which we had started: CloudWatch gives us the capability to filter packets based on certain expressions. We can filter all the rejected packets by creating a simple search for REJECT OK in the search bar and CloudWatch will give us all the traffic that was rejected. This is shown in the following image: Viewing the logs in GUI: Plain text data is good but it's not very appealing and does not give you deep insights about what exactly is happening. It's always preferred to send these logs to a Log Monitoring tool, which can give you deep insights about what exactly is happening. In my case, I have used Splunk to give us an overview about the logs in our environment. When we look into VPC Flow Logs, we see that Splunk gives us great detail in a very nice GUI and also maps the IP addresses to the location from which the traffic is coming: The following image is the capture of VPC flow logs which are being sent to the Splunk dashboard for analyzing the traffic patterns: The VPC Flow Logs traffic rate and location-related data The top rejected destination and IP address, which we rejected AWS Config AWS Config is a great service that allows us to continuously assess and audit the configuration of the AWS-related resources. With AWS Config, we can exactly see what configuration has changed from the previous week to today for services such as EC2, security groups, and many more. One interesting feature that Config allows is to set the compliance test as shown in the following screenshots. We see that there is one rule that is failing and is considered non-compliant, which is the CloudTrail. There are two important features that Config service provides: Evaluate changes in resources over the timeline Compliance checks Once they are enabled and you have associated Config rules accordingly, then you would see a dashboard similar to the following screenshot: In the preceding screenshot, on the left-hand side, Config gives details related to the Resources, which are present in your AWS; and on the right-hand column, Config gives us the status if the resources are compliant or non-compliant according to the rules that are set. Configuring the AWS Config service Let's look into how we can get started with the AWS Config service and have great dashboards along with compliance checks, which we saw in the previous screenshot: Enabling the Config service: The first time when we want to start working with Config, we need to select the resources we want to evaluate. In our case, we will select both the region-specific resources as well as global resources such as IAM: Configure S3 and IAM: Once we decide to include all the resources, the next thing is to create an Amazon S3 bucket where AWS Config will store the configuration and snapshot files. We will also need to select IAM role, which will allow Config into put these files to the S3 bucket: Select Config rules: Configuration rules are checks against your AWS resources, which can be done and the result will be part of the compliance standard. For example, root-account-mfa-enabled rule will check whether the ROOT account has MFA enabled or disabled and in the end it will give you a nice graphical overview about the output of the checks conducted by the rules. Currently, there are 38 AWS-managed rules, which we can select and use anytime; however, we can have custom rules anytime as well. For our case, I will use five specific rules, which are as follows: cloudtrail-enabled iam-password-policy restricted-common-ports restricted-ssh root-account-mfa-enabled Config initialization: With the Config rules selected, we can click on Finish and AWS Config will start, and it will start to check resources and its associated rules. You might get the dashboard similar to the following screenshot, which speaks about the available resources as well as the rule compliance related graphs: Let's analyze the functionality For demo purposes, I decided to disable the CloudTrail service and if we then look into the Config dashboard, it says that one rule check has been failed: Instead of graphs, Config can also show the resources in a tabular manner if we want to inspect the Config rules with the associated names. This is illustrated in the following diagram: Evaluating changes to resources AWS Config allows us to evaluate the configuration changes that have been made to the resources. This is a great feature that allows us to see how our resource looked a day, a week, or even months back. This feature is particularly useful specifically during incidents when, during investigation, one might want to see what exactly changed before the incident took place. It will help things go much faster. In order to evaluate the changes, we will need to perform the following steps: Go to AWS Config | Resources. This will give you the Resource inventory page in which you can either search for resources based on the resource type or based on tags. For our use case, I am searching for a tag value for an EC2 Instance whose name is OpenVPN: When we go inside the Config timeline, we see the overall changes that have been made to the resource. In the following screenshot, we see that there were a few changes that were made, and Config also shows us the time the changes that were made to the resource: When we click on Changes, it will give you the exact detail on what was the exact change that was made. In our case, it is related to the new network interface, which was attached to the EC2 instance. It displays the network interface ID, description along with the IP address, and the security group, which is attached to that network interface: When we start to integrate the AWS services with Splunk or similar monitoring tools, we can get great graphs, which will help us evaluate things faster. On the side, we always have the logs from the CloudTrail, if we want to see the changes that occurred in detail. We covered log monitoring and how to choose the right log monitoring tool for continuous security monitoring policy. Check out the book Enterprise Cloud Security and Governance to build resilient cloud architectures for tackling data disasters with ease. Cloud Security Tips: Locking Your Account Down with AWS Identity Access Manager (IAM) Monitoring, Logging, and Troubleshooting Analyzing CloudTrail Logs using Amazon Elasticsearch
Read more
  • 0
  • 0
  • 3332
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime
article-image-interactive-dashboard-with-vrealize-operations-manager
Vijin Boricha
03 Jul 2018
14 min read
Save for later

Interactive dashboard with vRealize Operations Manager [Tutorial]

Vijin Boricha
03 Jul 2018
14 min read
Creating a dashboard is a relatively simple exercise, creating a good dashboard will require tuning and some tweaking. The tricky part is displaying the information needed on a single screen, this is one of the biggest challenges when creating a dashboard; placing all the relevant information on a single pane of glass. The number one goal when creating a dashboard is to get all the information across in a glance. This is an excerpt from Mastering vRealize Operations Manager - Second Edition written by Spas Kaloferov, Scott Norris, Christopher Slater. Out of the 46 widgets vRealize Operations 6.6 has available, we will only use a handful of them regularly. The most commonly used widgets, from experience, are the scoreboard, metric selector, heat map, object list and metric chart. The rest are generally only used for specific use cases. There are basically two types of dashboards that we can create, an interactive dashboard or a static dashboard. An interactive dashboard is typically used for troubleshooting or similar activities where you are expecting the user to interact with widgets to get the information they are after. A static or display dashboard typically uses self-providing widgets such as scoreboards and heatmaps that are designed for display monitors, or other situations where an administrator is keeping an eye on environment changes. Each of the widgets has the ability to be a self-provider that means we set the information we want to display directly in the widget. The other option is to set up interactions and have other widgets provide information based on an object or metric selection in another widget. In this article, we will focus on the interactive dashboard. We will be looking at creating a dashboard that looks at vSphere cluster information, which at a glance will show us the overall health and general cluster information an administrator would need. Working through this will give you the knowledge needed to create any type of dashboard. The dashboard we are about to create will show how to configure the more common widgets in a way that can be replicated on a greater scale. When creating a dashboard, you will generally go through the following steps: Start the New Dashboard wizard from the Actions menu. Configure the general dashboard settings. Add and configure individual widgets. (Optional) Configure widget interactions. (Optional) Configure dashboard navigation. Creating an interactive dashboard You can create a dashboard by using the New Dashboard wizard. Alternatively, you can clone an existing dashboard and modify the clone. Perform the following steps to create a new dashboard: Navigate to the Dashboards page, click Actions, and then click Create Dashboard, as shown in the following screenshot: Under Dashboard Configuration, we need to give it a meaningful name and provide a description. If you click Yes for the Is default setting, the dashboard appears on the homepage when you log in. By default, the Recommendations dashboard is the dashboard that appears on the home page when a user logs in. You can change the default dashboard. Next, we click on Widget List to bring up all the available widgets. Here we will click and drag the widgets we need from the left pane to the right. We will use the following: Object List Metric Picker Metric Chart Generic Scoreboard Heat Map You can arrange widgets in the dashboard by dragging them to the desired column position. The left pane and the right pane are collapsed so that you have more room for your dashboard workspace, which is the center pane. To edit the widgets, we click on the little pen icon sitting at the top of the widget. The Object List The Object List widget configuration options, as shown in the following screenshot, include some of the more common options, such as Title, Refresh Content, and Refresh Interval. Options also exist that are specific to this widget: Mode: You can select Self, Children, or Parent mode. This setting is used by widget interactions. Auto Select First Row: This enables you to select whether or not to start with the first row of data. Select which tags to filter: This enables you to select objects from an object tree to observe. For example, you can choose to observe information about objects managed by the vCenter Server instance named VCVA01. You can add different metrics using the Additional Column option during widget configuration. Using the Additional Column pane, you can add metrics that are specific for each object in the data grid columns. Perform the following steps to edit the Object List widget in our example dashboard: Click this on Object List and the Edit Object List window will appear. In here edit the Title, select On for Auto Select First Row, and select Cluster Compute Resource under Object Types Tag. We should have something similar to the following screenshot. Click Save to continue. With tag selection, multiple tags can be selected, if this is done then only objects that fall under both tag types will be shown in the widget. The next thing we want to do is click on Widget Interactions on the left pane; this is where we go to link the widgets, for example, we select a virtual machine from an object list widget and it would change any linked widgets to display the information of that object. We will see a Selected Object(s) with a drop-down list followed by a green arrow pointing to our widgets. This is saying that what we select in the drop-down list will be linked to the associated widget. Here our new Cluster List will feed Metric Picker, Scoreboard, and Heatmap, while Metric Picker will feed Metric Chart. Also we will notice that a widget like Metric Chart can be fed by more than one widget. Click APPLY INTERACTIONS and we should end up with something similar to the following screenshot: The Metric Picker Now, if we select a metric under the Metric Picker widget it should show the metric in the Metric Chart widget, as displayed in the following screenshot: Metric Picker will contain all the available metrics for the selected object, such as an ESXi host or a Virtual Machine. The Heatmap Next up, we will edit the Heatmap widget. For this example, we will use the Heatmap widget to display capacity remaining for the datastores attached to the vSphere cluster. This is the best way to see at a glance that none of the datastores are over 90% used or getting close. We need to make the following changes: Give the widget a new name describing what we are trying to achieve. Change Group By to Cluster Compute Resource - This is what we want the parent container to be. Change mode to Instance - This mode type is best used when interacting with other widgets to get its objects. Change Object Type to Datastore - This is the object that we want displayed. Change Attribute Kind to Disk Space | Capacity Remaining (%) - The metric of the object that we want to use. Change Colors around to 0 (Min), 20 (Max) - Because we really only want to know if a datastore is getting close to the threshold, minimizing the range will give us more granular colors. Change the colors around making it red on the left and green on the right. This is done by clicking on the little color square at each end and picking a new color. The reason this is done is because we have capacity remaining, so we need 0% remaining as red. Click Save and we should now have something similar to the following screenshot, with each square representing a datastore: Move the mouse over each box to reveal more detail of the object. The Scoreboard Time to modify the last widget. This one will be a little more complicated due to how we display what we want while being interactive. When we configured the widget interactions, we noticed that the scoreboard widget was populated automatically with a bunch of metrics, as shown in the following screenshot: Now, let's go back to our dashboard creation and edit the Scoreboard widget. We will notice quite a lot of configuration options compared to others, most of which are how the boxes are laid out, such as number of columns, box size, and rounding out decimals. What we want to do for this widget is: Name the scoreboard something meaningful Round the decimals to 1 - this cuts down the amount of decimal places returned on the displayed value Under Metric Configuration choose the Host-Util file from the drop-down list We should now see something similar to the following screenshot: But what about the object selection you may have noticed in the lower half of the scoreboards widget? These are only used if we make the widget a self-provider, which we can see as an option to the top left of the edit window. We can choose objects and metrics, but they are ignored when Self Provider is set to Off. If we now click Save we should see the new configuration of the scoreboard widget, as shown in the following screenshot:  I’ve also changed the Visual Theme to Original in the scoreboard widget configuration options to change the way the scoreboard visualizes the information. The scoreboard widget may not always display the information we necessarily need. To get the widget to display the information we want while continuing to be interactive to our selections in the Cluster List widgets, we have to create a metric configuration (XML) file. Metric Configuration Files (XML) A lot of the widgets are edited through the GUI with the objects and metrics we want displayed, but some require a metric configuration file to define what metrics the widget should display. Metric configuration files can create a custom set of metrics for the customization of supported widgets with meaningful data. Metric configuration files store the metric attribute keys in XML format. These widgets support customization using metric configuration files: Scoreboard Metric Chart Property List Rolling View Chart Sparkline Chart Topology Graph To keep this simple, we will configure four metrics to be displayed, which are: CPU usage for the cluster in % CPU demand for the cluster Memory ballooning CPU usage for the cluster in MHz Perform the following steps to create a metric configuration file: Open a text editor, add the following code, and save it as an XML file; in this case we will call it clusterexample.xml: <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <AdapterKinds> <AdapterKind adapterKindKey="VMWARE"> <ResourceKind resourceKindKey="ClusterComputeResource"> <Metric attrkey="cpu|capacity_usagepct_average" label="CPU" unit="%" yellow="50" orange="75" red="90" /> <Metric attrkey="cpu|demandPct" label="CPU Demand" unit="%" yellow="50" orange="75" red="90" /> <Metric attrkey="cpu|usagemhz_average" label="CPU Usage" unit="GHz" yellow="8" orange="16" red="20" /> <Metric attrkey="mem|vmmemctl_average" label="Balloon Mem" unit="GB" yellow="100" orange="150" red="200" /> </ResourceKind> </AdapterKind> </AdapterKinds> Using WinSCP or another similar product, upload this file to the following location on the vRealize Operations 6.0 virtual appliance: /usr/lib/vmware-vcops/tomcat-web-app/webapps/vcops-web-ent/WEB-INF/classes/resources/reskndmetrics In this location, you will notice some built in sample XML files. Alternatively, you can create the XML file from the vRealize Operations user interface. To do so, navigate to Administration | Configuration, and then Metric Configuration. Now let's go back to our dashboard creation and edit the Scoreboard widget. Under Metric Configuration choose the clusterexmaple.xml file that we just created from the drop-down list. Click Save to save the configuration. We have now completed the new dashboard; click Save on the bottom right to save the dashboard. We can go back and edit this dashboard whenever we need to. This new dashboard will now be available on the home page, this is shown in the following screenshot: For the Scoreboard widget we have used an XML file so the widget will display the metrics we would like to see when an object is selected in another widget. How can we get the correct metric and adapter names to be used in this file? Glad you asked. The simplest way to get the correct information we need for that XML file is to create a non-interactive dashboard with the widget we require with all the information we want to display for our interactive one. For example, let's quickly create a temp dashboard with only one scoreboard widget and populate it with what we want by manually selecting the objects and metrics with self-provider set to yes: Create another dashboard and drag and drop a single scoreboard widget. Edit the scoreboard widget and configure it with all the information we would like. Search for an object in the middle pane and select the widgets we want in the right pane. Configure the box label and Measurement Unit. A thing to note here is that we have selected memory balloon metric as shown in the following screenshot, but we have given it a label of GB. This is because of a new feature in 6.0 it will automatically upscale the metrics when shown on a scoreboard, this also goes for datastore GB to TB, CPU MHz to GHz, and network throughput from KBps to MBps. Typically in 5.x we would create super metrics to make this happen. The downside to this is that the badge color still has to be set in the metrics base format. Save this dashboard once we have the metrics we want. Locate it under our dashboard list and select it, click on the little cog, and select Export Dashboards as shown in the following screenshot. This will automatically download a file called Dashboard<Date>.json. Open this file in a text editor and have a look through it and we will see all the information we require to write our XML interaction file. First off is our resourceKindKey and adapterKindKey, as shown in the following screenshot. These are pretty self-explanatory, resourceKind being Cluster resource, and adapter is the adapter that's collecting the metrics, in this case the inbuilt vCenter one called VMWARE. Next are our resources, as we can see from the following screenshot we have metricKey, which is the most important one as well as the color settings, unit, and the label: There it is, how we can get the information we require for XML files: <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <AdapterKinds> <AdapterKind adapterKindKey="VMWARE"> <ResourceKind resourceKindKey="ClusterComputeResource"> <Metric attrkey="cpu|capacity_usagepct_average" label="CPU" unit="%" yellow="50" orange="75" red="90" /> <Metric attrkey="cpu|demandPct" label="CPU Demand" unit="%" yellow="50" orange="75" red="90" /> <Metric attrkey="cpu|usagemhz_average" label="CPU Usage" unit="GHz" yellow="8" orange="16" red="20" /> <Metric attrkey="mem|vmmemctl_average" label="Balloon Mem" unit="GB" yellow="100" orange="150" red="200" /> </ResourceKind> </AdapterKind> </AdapterKinds> Any widget with the setting Metric Configuration available can use the XML files you create. The XML format is as per the preceding code. An XML file can also have multiple Adapter kinds as there could be different adapter metrics that you require. Today, we learned to create a dashboard that is interactive based on selections made within widgets. You also unraveled the mystery of the metric configuration XML file and how to get the information you require into it. To know more about Super metrics and when to use it, check out this book Mastering vRealize Operations Manager - Second Edition. What to expect from vSphere 6.7 How to ace managing the Endpoint Operations Management Agent with vROps Troubleshooting techniques in vRealize Operations components
Read more
  • 0
  • 0
  • 10168

article-image-troubleshooting-in-vrealize-operations-components
Vijin Boricha
06 Jun 2018
11 min read
Save for later

Troubleshooting techniques in vRealize Operations components

Vijin Boricha
06 Jun 2018
11 min read
vRealize Operations Manager helps in managing the health, efficiency and compliance of virtualized environments. In this tutorial, we have covered the latest troubleshooting techniques of vRealize Operations. Let's see what we can do to monitor and troubleshoot the key vRealize Operations components and services. This is an excerpt from Mastering vRealize Operations Manager - Second Edition written by Spas Kaloferov, Scott Norris, and Christopher Slater. Troubleshooting Services Let's first take a look into some of the most important vRealize Operations services. The Apache2 service vRealize Operations internally uses the Apache2 web server to host Admin UI, Product UI, and Suite API. Here are some useful service log locations: Log file nameLocationPurposeaccess_log & error_logstorage/log/var/log/apache2 Stores log information for the Apache service The Watchdog service Watchdog is a vRealize Operations service that maintains the necessary daemons/services and attempts to restart them as necessary should there be any failure. The vcops-watchdog is a Python script that runs every 5 minutes by means of the vcops-watchdog-daemon with the purpose of monitoring the various vRealize Operations services, including CaSA. The Watchdog service performs the following checks: PID file of the service Service status Here are some useful service log locations: Log file nameLocationPurposevcops-watchdog.log/usr/lib/vmware-vcops/user/log/vcops-watchdog/Stores log information for the WatchDog service The Collector service The Collector service sends a heartbeat to the controller every 30 seconds. By default, the Collector will wait for 30 minutes for adapters to synchronize. The collector properties, including enabling or disabling Self Protection, can be configured from the collector.properties properties file located in /usr/lib/vmware-vcops/user/conf/collector. Here are some useful service log locations: Log file nameLocationPurposecollector.log/storage/log/vcops/log/Stores log information for the Collector service The Controller service The Controller service is part of the analytics engine. The controller does the decision making on where new objects should reside. The controller manages the storage and retrieval of the inventory of the objects within the system. The Controller service has the following uses: It will monitor the collector status every minute How long a deleted resource is available in the inventory How long a non-existing resource is stored in the database Here are some useful service file locations: Log file nameLocationPurposecontroller.properties/usr/lib/vmware-vcops/user/conf/controller/Stores properties information for the Controller service Databases As we learned, vRealize Operations contains quite a few databases, all of which are of great importance for the function of the product. Let’s take a deeper look into those databases. Cassandra DB Currently, Cassandra DB stores the following information: User Preferences and Config Alerts definition Customizations Dashboards, Policies, and View Reports and Licensing Shard Maps Activities Cassandra stores all the information which we see in the content folder; basically, any settings which are applied globally. You are able to log into the Cassandra database from any Analytic Node. The information is the same across nodes. There are two ways to connect to the Cassandra database: cqlshrc is a command-line tool used to get the data within Cassandra, in a SQL-like fashion (inbuilt). To connect to the DB, run the following from the command line (SSH): $VMWARE_PYTHON_BIN $ALIVE_BASE/cassandra/apache-cassandra-2.1.8/bin/cqlsh --ssl --cqlshrc $ALIVE_BASE/user/conf/cassandra/cqlshrc Once you are connected to the DB, we need to navigate to the globalpersistence key space using the following command: vcops_user@cqlsh&gt; use globalpersistence ; The nodetool command-line tool Once you are logged on to the Cassandra DB, we can run the following commands to see information: Command syntaxPurposeDescribe tablesTo list all the relation (tables) in the current database instanceDescribe &lt;table_name&gt;To list the content of that particular tableExitTo exit the Cassandra command lineselect commandTo select any Column data from a tabledelete commandTo delete any Column data from a table Some of the important tables in Cassandra are: Table namePurposeactivity_2_tblStores all the activitiestbl_2a8b303a3ed03a4ebae2700cbfae90bfStores the Shard mapping information of an object (table name may be differ in each environment)supermetricStores the defined super metricspolicyStores all the defined policiesAuthStores all the user details in the clusterglobal_settingsAll the configured global settings are stored herenamespace_to_classtypeInforms what type of data is stored in what table under CassandrasymptomproblemdefinitionAll the defined symptomscertificatesStores all the adapter and data source certificates The Cassandra database has the following configuration files: File typeLocationCassandra.yaml/usr/lib/vmware-vcops/user/conf/cassandra/vcops_cassandra.properties/usr/lib/vmware-vcops/user/conf/cassandra/Cassandra conf scripts/usr/lib/vmware_vcopssuite/utilities/vmware/vcops/cassandra/ The Cassandra.yaml file stores certain information such as the default location to save data (/storage/db/vcops/cassandra/data). The file contains information about all the nodes. When a new node joins the cluster, it refers to this file to make sure it contacts the right node (master node). It also has all the SSL certificate information. Cassandra is started and stopped via the CaSA service, but just because CaSA is running does not mean that the Cassandra service is necessarily running. The service command to check the status of the Cassandra DB service from the command line (SSH) is: service vmware-vcops status cassandra The Cassandra cassandraservice.Sh is located in: $VCOPS_BASE/cassandra/bin/ Here are some useful Cassandra DB log locations: Log File NameLocationPurposeSystem.log/usr/lib/vmware-vcops/user/log/cassandraStores all the Cassandra-related activitieswatchdog_monitor.log/usr/lib/vmware-vcops/user/log/cassandraStores Cassandra Watchdog logswrapper.log/usr/lib/vmware-vcops/user/log/cassandraStores Cassandra start and stop-related logsconfigure.log /usr/lib/vmware-vcops/user/log/cassandraStores logs related to Python scripts of vRopsinitial_cluster_setup.log/usr/lib/vmware-vcops/user/log/cassandraStores logs related to Python scripts of vRopsvalidate_cluster.log/usr/lib/vmware-vcops/user/log/cassandraStores logs related to Python scripts of vRops To enable debug logging for the Cassandra database, edit the logback.xml XML file located in: /usr/lib/vmware-vcops/user/conf/cassandra/ Change &lt;root level="INFO"&gt; to &lt;root level="DEBUG"&gt;. The System.log will not show the debug logs for Cassandra. If you experience any of the following issues, this may be an indicator that the database has performance issues, and so you may need to take the appropriate steps to resolve them: Relationship changes are not be reflected immediately Deleted objects still show up in the tree View and Reports takes slow in processing and take a long time to open. Logging on to the Product UI takes a long time for any user Alert was supposed to trigger but it never happened Cassandra can tolerate 5 ms of latency at any given point of time. You can validate the cluster by running the following command from the command line (SSH). First, navigate to the following folder: /usr/lib/vmware-vcopssuite/utilities/ Run the following command to validate the cluster: $VMWARE_PYTHON_BIN -m vmware.vcops.cassandra.validate_cluster &lt;IP_ADDRESS_1 IP_ADDRESS_2&gt; You can also use the nodetool to perform a health check, and possibly resolve database load issues. The command to check the load status of the activity tables in vRealize Operations version 6.3 and older is as follows: $VCOPS_BASE/cassandra/apache-cassandra-2.1.8/bin/nodetool --port 9008 status For the 6.5 release and newer, VMware added the requirement of using a 'maintenanceAdmin' user along with a password file. The new command is as follows: $VCOPS_BASE/cassandra/apache-cassandra-2.1.8/bin/nodetool -p 9008 --ssl -u maintenanceAdmin --password-file /usr/lib/vmware-vcops/user/conf/jmxremote.password status Regardless of which method you choose to perform the health check, if any of the nodes have over 600 MB of load, you should consult with VMware Global Support Services on the next steps to take, and how to elevate the load issues. Central (Repl DB) The Postgres database was introduced in 6.1. It has two instances in version 6.6. The Central Postgres DB, also called repl, and the Alerts/HIS Postgres DB, also called data, are two separate database instances under the database called vcopsdb. The central DB exists only on the master and the master replica nodes when HA is enabled. It is accessible via port 5433 and it is located in /storage/db/vcops/vpostgres/repl. Currently, the database stores the Resources inventory. You can connect to the central DB from the command line (SSH). Log in on the analytic node you wish to connect to and run: su - postgres The command should not prompt for a password if ran as root. Once logged in, connect to the database instance by running: /opt/vmware/vpostgres/current/bin/psql -d vcopsdb -p 5433 The service command to start the central DB from the command line (SSH) is: service vpostgres-repl start Here are some useful log locations: Log file nameLocationPurposepostgresql-&lt;xx&gt;.log/storage/db/vcops/vpostgres/data/pg_logProvides information on Postgres database cleanup and other disk-related information Alerts/HIS (Data) DB The Alerts DB is called data on all the data nodes including the master and master replica node. It was again introduced in 6.1. Starting from 6.2, the  Historical Inventory Service xDB was merged with the Alerts DB. It is accessible via port 5432 and it is located in /storage/db/vcops/vpostgres/data. Currently, the database stores: Alerts and alarm history History of resource property data History of resource relationship You can connect to the Alerts DB from the command line (SSH). Log in on the analytic node you wish to connect to and run: su - postgres The command should not prompt for a password if ran as root. Once logged in, connect to the database instance by running: /opt/vmware/vpostgres/current/bin/psql -d vcopsdb -p 5432 The service command to start the Alerts DB from the command line (SSH) is: service vpostgres start FSDB The File System Database (FSDB) contains all raw time series metrics and super metrics data for the discovered resources. What is FSDB in vRealize Operations Manager?: FSDB is a GemFire server and runs inside analytics JVM. FSDB in vRealize Operations uses the Sharding Manager to distribute data between nodes (new objects). (We will discuss what vRealize Operations cluster nodes are later in this chapter.) The File System Database is available in all the nodes of a vRops Cluster deployment. It has its own properties file. FSDB stores data (time series data ) collected by adapters and data which is generated/calculated (system, super, badge, CIQ metrics, and so on) based on analysis of that data. If you are troubleshooting FSDB performance issues, you should start from the Self Performance Details dashboard, more precisely, the FSDB Data Processing widget. We covered both of these earlier in this chapter. You can also take a look at the metrics provided by the FSDB Metric Picker: You can access it by navigating to the Environment tab, vRealize Operations Clusters, selecting a node, and selecting vRealize Operations Manager FSDB. Then, select the All Metrics tab. You can check the synchronization state of the FSDB to determine the overall health of the cluster by running the following command from the command line (SSH): $VMWARE_PYTHON_BIN /usr/lib/vmware-vcops/tools/vrops-platform-cli/vrops-platform-cli.py getShardStateMappingInfo By restarting the FSDB, you can trigger synchronization of all the data by getting missing data from other FSDBs. Synchronization takes place only when you have vRealize Operations HA configured. Here are some useful log locations for FSDB: Log file nameLocaitonPurposeAnalytics-&lt;UUID&gt;.log/usr/lib/vmware-vcops/user/log Used to check the Sharding Module functionality Can trace which objects have synced ShardingManager_&lt;UUID&gt;.log/usr/lib/vmware-vcops/user/logCan be used to get the total time the sync tookfsdb-accessor-&lt;UUID&gt;.log/usr/lib/vmware-vcops/user/logProvides information on FSDB database cleanup and other disk-related information Platform-cli Platform-cli is a tool by which we can get information from various databases, including the GemFire cache, Cassandra, and the Alerts/HIS persistence databases. In order to run this Python script, you need to run the following command: $VMWARE_PYTHON_BIN /usr/lib/vmware-vcops/tools/vrops-platform-cli/vrops-platform-cli.py The following example of using this command will list all the resources in ascending order and also show you which shard it is stored on: $VMWARE_PYTHON_BIN /usr/lib/vmware-vcops/tools/vrops-platform-cli/vrops-platform-cli.py getResourceToShardMapping We discussed how to troubleshoot some of the most important components like services and databases along with troubleshooting failures in the upgrade process. To know more about self-monitoring dashboards and infrastructure compliance, check out this book Mastering vRealize Operations Manager - Second Edition. What to expect from vSphere 6.7 Introduction to SDN – Transformation from legacy to SDN Learning Basic PowerCLI Concepts
Read more
  • 0
  • 0
  • 18356

article-image-endpoint-operations-management-agent-vrealize-operations
Vijin Boricha
30 May 2018
10 min read
Save for later

How to ace managing the Endpoint Operations Management Agent with vROps

Vijin Boricha
30 May 2018
10 min read
In this tutorial, you will learn the functionalities of endpoint operations management agent and how you can effectively manage it. You can install the Endpoint Operations Management Agent from a tar.gz or .zip archive, or from an operating system-specific installer for Windows or for Linux-like systems that support RPM. This is an excerpt from Mastering vRealize Operations Manager - Second Edition written by Spas Kaloferov, Scott Norris, Christopher Slater.  The agent can come bundled with JRE or without JRE. Installing the Agent Before you start, you have to make sure you have vRealize Operations user credentials to ensure that you have enough permissions to deploy the agent. Let's see how can we install the agent on both Linux and Windows machines. Manually installing the Agent on a Linux Endpoint Perform the following steps to install the Agent on a Linux machine: Download the appropriate distributable package from https://my.vmware.com and copy it over to the machine. The Linux .rpm file should be installed with the root credentials. Log in to the machine and run the following command to install the agent: [root]# rpm -Uvh <DistributableFileName> In this example, we are installing on CentOS. Only when we start the service will it generate a token and ask for the server's details. Start the Endpoint Operations Management Agent service by running: service epops-agent start The previous command will prompt you to enter the following information about the vRealize Operations instance: vRealize Operations server hostname or IP: If the server is on the same machine as the agent, you can enter the localhost. If a firewall is blocking traffic from the agent to the server, allow the traffic to that address through the firewall. Default port: Specify the SSL port on the vRealize Operations server to which the agent must connect. The default port is 443. Certificate to trust: You can configure Endpoint Operations Management agents to trust the root or intermediate certificate to avoid having to reconfigure all agents if the certificate on the analytics nodes and remote collectors are modified. vRealize Operations credentials: Enter the name of a vRealize Operations user with agent manager permissions. The agent token is generated by the Endpoint Operations Management Agent. It is only created the first time the endpoint is registered to the server. The name of the token is random and unique, and is based on the name and IP of the machine. If the agent is reinstalled on the Endpoint, the same token is used, unless it is deleted manually. Provide the necessary information, as shown in the following screenshot: Go into the vRealize Operations UI and navigate to Administration | Configuration | Endpoint Operations, and select the Agents tab: After it is deployed, the Endpoint Operations Management Agent starts sending monitoring data to vRealize Operations, where the data is collected by the Endpoint Operations Management adapter. You can also verify the collection status of the Endpoint Operations Management Agent for your virtual machine on the Inventory Explorer page. If you encounter issues during the installation and configuration, you can check the /opt/vmware/epops-agent/log/agent.log log file for more information. We will examine this log file in more detail later in this book. Manually installing the agent on a Windows Endpoint On a Windows machine, the Endpoint Operations Management Agent is installed as a service. The following steps outline how to install the Agent via the .zip archive bundle, but you can also install it via an executable file. Perform the following steps to install the Agent on a Windows machine: Download the agent .zip file. Make sure the file version matches your Microsoft Windows OS. Open Command Prompt and navigate to the bin folder within the extracted folder. Run the following command to install the agent: ep-agent.bat install After the install, is complete, start the agent by executing: ep-agent.bat start If this is the first time you are installing the agent, a setup process will be initiated. If you have specified configuration values in the agent properties file, those will be used. Upon starting the service, it will prompt you for the same values as it did when we installed in on a Linux machine. Enter the following information about the vRealize Operations instance: vRealize Operations server hostname or IP Default port Certificate to trust vRealize Operations credentials If the agent is reinstalled on the endpoint, the same token is used, unless it is deleted manually. After it is deployed, the Endpoint Operations Management Agent starts sending monitoring data to vRealize Operations, where the data is collected by the Endpoint Operations Management adapter. You can verify the collection status of the Endpoint Operations Management Agent for your virtual machine on the Inventory Explorer page: Because the agent is collecting data, the collection status of the agent is green. Alternatively, you can also verify the collection status of the Endpoint Operations Management Agent by navigating to Administration | Configuration | Endpoint Operations, and selecting the Agents tab. Automated agent installation using vRealize Automation In a typical environment, the Endpoint Operations Management Agent will not be installed on many/all virtual machines or physical servers. Usually, only those servers holding critical applications or services will be monitored using the agent. Of those servers which are not, manually installing and updating the Endpoint Operations Management Agent can be done with reasonable administrative effort. If we have an environment where we need to install the agent on many VMs, and we are using some kind of provisioning and automation engine like Microsoft System Center Configuration Manager or VMware vRealize Automation, which has application-provisioning capabilities, it is not recommended to install the agent in the VM template that will be cloned. If for whatever reason you have to do it, do not start the Endpoint Operations Management Agent, or remove the Endpoint Operations Management token and data before the cloning takes place. If the Agent is started before the cloning, a client token is created and all clones will be the same object in vRealize Operations. In the following example, we will show how to install the Agent via vRealize Automation as part of the provisioning process. I will not go into too much detail on how to create blueprints in vRealize Automation, but I will give you the basic steps to create a software component that will install the agent and add it to a blueprint. Perform the following steps to create a software component that will install the Endpoint Operations Management Agent and add it to a vRealize Automation Blueprint: Open the vRealize Automation portal page and navigate to Design, the Software Components tab, and click New to create a new software component. On the General tab, fill in the information. Click Properties and click New . Add the following six properties, as shown in the following table: Name Type Value Encrypted Overridable Required Computed RPMNAME String The name of the Agent RPM package No Yes No No SERVERIP String The IP of the vRealize Operations server/cluster No Yes No No SERVERLOGIN String Username with enough permission in vRealize Operations No Yes No No SERVERPWORD String Password for the user No Yes No No SRVCRTTHUMB String The thumbprint of the vRealize Operations certificate No Yes No No SSLPORT String The port on which to communicate with vRealize Operations No Yes No No These values will be used to configure the Agent once it is installed. The following screenshot illustrates some example values: Click Actions and configure the Install, Configure, and Start actions, as shown in the following table: Life Cycle Stage Script Type Script Reboot Install Bash rpm -i http://<FQDNOfThe ServerWhere theRPMFileIsLocated>/$RPMNAME No (unchecked) Configure Bash sed -i "s/#agent.setup.serverIP=localhost/agent.setup.serverIP=$SERVERIP/" /opt/vmware/epops-agent/conf/agent.properties sed -i "s/#agent.setup.serverSSLPort=443/agent.setup.serverSSLPort=$SSLPORT/" /opt/vmware/epops-agent/conf/agent.properties sed -i "s/#agent.setup.serverLogin=username/agent.setup.serverLogin=$SERVERLOGIN/" /opt/vmware/epops-agent/conf/agent.properties sed -i "s/#agent.setup.serverPword=password/agent.setup.serverPword=$SERVERPWORD/" /opt/vmware/epops-agent/conf/agent.properties sed -i "s/#agent.setup.serverCertificateThumbprint=/agent.setup.serverCertificateThumbprint=$SRVCRTTHUMB/" /opt/vmware/epops-agent/conf/agent.properties No (unchecked) Start Bash /sbin/service epops-agent start /sbin/chkconfig epops-agent on No (unchecked) The following screenshot illustrates some example values: Click Ready to Complete and click Finish. Edit the blueprint to where you want the agent to be installed. From Categories, select Software Components. Select the Endpoint Operations Management software component and drag and drop it over the machine type (virtual machine) where you want the object to be installed: Click Save to save the blueprint. This completes the necessary steps to automate the Endpoint Operations Management Agent installation as a software component in vRealize Automation. With every deployment of the blueprint, after cloning, the Agent will be installed and uniquely identified to vRealize Operations. Reinstalling the agent When reinstalling the agent, various elements are affected, like: Already-collected metrics Identification tokens that enable a reinstalled agent to report on the previously-discovered objects on the server The following locations contain agent-related data which remains when the agent is uninstalled: The data folder containing the keystore The epops-token platform token file which is created before agent registration Data continuity will not be affected when the data folder is deleted. This is not the case with the epops-token file. If you delete the token file, the agent will not be synchronized with the previously discovered objects upon a new installation. Reinstalling the agent on a Linux Endpoint Perform the following steps to reinstall the agent on a Linux machine: If you are removing the agent completely and are not going to reinstall it, delete the data directory by running: $ rm -rf /opt/epops-agent/data If you are completely removing the client and are not going to reinstall or upgrade it, delete the epops-token file: $ rm /etc/vmware/epops-token Run the following command to uninstall the Agent: $ yum remove <AgentFullName> If you are completely removing the client and are not going to reinstall it, make sure to delete the agent object from the Inventory Explorer in vRealize Operations. You can now install the client again, the same way as we did earlier in this book. Reinstalling the Agent on a Windows Endpoint Perform the following steps to reinstall the Agent on a Windows machine: From the CLI, change the directory to the agent bin directory: cd C:epops-agentbin Stop the agent by running: C:epops-agentbin> ep-agent.bat stop Remove the agent service by running: C:epops-agentbin> ep-agent.bat remove If you are completely removing the client and are not going to reinstall it, delete the data directory by running: C:epops-agent> rd /s data If you are completely removing the client and are not going to reinstall it, delete the epops-token file using the CLI or Windows Explorer: C:epops-agent> del "C:ProgramDataVMwareEP Ops agentepops-token" Uninstall the agent from the Control Panel. If you are completely removing the client and are not going to reinstall it, make sure to delete the agent object from the Inventory Explorer in vRealize Operations. You can now install the client again, the same way as we did earlier in this book. You've become familiar with the Endpoint Operations Management Agent and what it can do. We also discussed how to install and reinstall it on both Linux and Windows operating systems. To know more about vSphere and vRealize automation workload placement, check out our book  Mastering vRealize Operations Manager - Second Edition. VMware vSphere storage, datastores, snapshots What to expect from vSphere 6.7 KVM Networking with libvirt
Read more
  • 0
  • 0
  • 4273

article-image-vmware-vsphere-storage-datastores-snapshots
Packt
21 Feb 2018
9 min read
Save for later

VMware vSphere storage, datastores, snapshots

Packt
21 Feb 2018
9 min read
VMware vSphere storage, datastores, snapshotsIn this article, byAbhilash G B, author of the book,VMware vSphere 6.5 CookBook - Third Edition, we will cover the following:Managing VMFS volumes detected as snapshotsCreating NFSv4.1 datastores with Kerberos authenticationEnabling storage I/O control (For more resources related to this topic, see here.) IntroductionStorage is an integral part of any infrastructure. It is used to store the files backing your virtual machines. The most common way to refer to a type of storage presented to a VMware environment is based on the protocol used and the connection type. NFS are storage solutions that can leverage the existing TCP/IP network infrastructure. Hence, they are referred to as IP-based storage. Storage IO Control (SIOC) is one of the mechanisms to use ensure a fair share of storage bandwidth allocation to all Virtual Machines running on shared storage, regardless of the ESXi host the Virtual Machines are running on. Managing VMFS volumes detected as snapshotsSome environments maintain copies of the production LUNs as a backup, by replicating them. These replicas are exact copies of the LUNs that were already presented to the ESXi hosts. If for any reason a replicated LUN is presented to an ESXi host, then the host will not mount the VMFS volume on the LUN. This is a precaution to prevent data corruption. ESXi identifies each VMFS volume using its signature denoted by aUniversally Unique Identifier (UUID). The UUID is generated when the volume is first created or resignatured and is stored in the LVM header of the VMFS volume. When an ESXi host scans for new LUN ;devices and VMFS volumes on it, it compares the physical device ID (NAA ID) of the LUN with the device ID (NAA ID) value stored in the VMFS volumes LVM header. If it finds a mismatch, then it flags the volume as a snapshot volume.Volumes detected as snapshots are not mounted by default. There are two ways to mount such volumes/datastore:Mount by Keeping the Existing Signature Intact - This is used when you are attempting to temporarily mount the snapshot volume on an ESXi that doesn't see the original volume. If you were to attempt mounting the VMFS volume by keeping the existing signature and if the host sees the original volume, then you will not be allowed to mount the volume and will be warned about the presence of another VMFS volume with the same UUID:Mount by generating a new VMFS Signature - This has to be used if you are mounting a clone or a snapshot of an existing VMFS datastore to the same host/s. The process of assigning a new signature will not only update the LVM header with the newly generated UUID, but all the Physical Device ID (NAA ID) of the snapshot LUN. Here, the VMFS volume/datastore will be renamed by prefixing the wordsnap followed by a random number and the name of the original datastore: Getting readyMake sure that the original datastore and its LUN is no longer seen by the ESXi host the snapshot is being mounted to. How to do it...The following procedure will help mount a VMFS volume from a LUN detected as a snapshot:Log in to the vCenter Server using the vSphere Web Client and use the key combination Ctrl+Alt+2 to switch to the Host and Clusters view.Right click on the ESXi host the snapshot LUN is mapped to and go to Storage | New Datastore.On the New Datastore wizard, select VMFS as the filesystem type and click Next to continue.On the Name and Device selection screen, select the LUN detected as a snaphsot and click Next to continue:On the Mount Option screen, choose to either mount by assigning a new signature or by keeping the existing signature, and click Next to continue:On the Ready to Complete screen, review the setting and click Finish to initiate the operation. Creating NFSv4.1 datastores with Kerberos authenticationVMware introduced support for NFS 4.1 with vSphere 6.0. The vSphere 6.5 added several enhancements:It now supports AES encryptionSupport for IP version 6Support Kerberos's integrity checking mechanismHere, we will learn how to create NFS 4.1 datastores. Although the procedure is similar to NFSv3, there are a few additional steps that needs to be performed. Getting readyFor Kerberos authentication to work, you need to make sure that the ESXi hosts and the NFS Server is joined to the Active Directory domainCreate a new or select an existing AD user for NFS Kerberos authenticationConfigure the NFS Server/Share to Allow access to the AD user chosen for NFS Kerberos authentication How to do it...The following procedure will help you mount an NFS datasture using the NFSv4.1 client with Kerberos authentication enabled:Log in to the vCenter Server using the vSphere Web Client and use the key combination Ctrl+Alt+2 to switch to the Host and Clusters view, select the desired ESXi host and navigate to it  Configure | System | Authentication Services section and supply the credentials of the Active Directory user that was chosen for NFS Kerberon Authentication:Right-click on the desired ESXi host and go to Storage | New Datastore to bring-up the Add Storage wizard.On the New Datastore wizard, select the Type as NFS and click Next to continue.On the Select NFS version screen, select NFS 4.1 and click Next to continue. Keep in mind that it is not recommended to mount an NFS Export using both NFS3 and NFS4.1 client. On the Name and Configuration screen, supply a Name for the Datastore, the NFS export's folder path and NFS Server's IP Address or FQDN. You can also choose to mount the share as ready-only if desired:On the Configure Kerberos Authentication screen, check the Enable Kerberos-based authentication box and choose the type of authentication required and click Next to continue:On the Ready to Complete screen review the settings and click Finish to mount the NFS export. Enabling storage I/O controlThe use of disk shares will work just fine as long as the datastore is seen by a single ESXi host. Unfortunately, that is not a common case. Datastores are often shared among multiple ESXi hosts. When datastores are shared, you bring in more than one local host scheduler into the process of balancing the I/O among the virtual machines. However, these lost host schedules cannot talk to each other and their visibility is limited to the ESXi hosts they are running on. This easily contributes to a serious problem called thenoisy neighbor situation. The job of SIOC is to enable some form of communication between local host schedulers so that I/O can be balanced between virtual machines running on separate hosts.  How to do it...The following procedure will help you enable SIOC on a datastore:Connect to the vCenter Server using the Web Client and switch to the Storage view using the key combination Ctrl+Alt+4.Right-click on the desired datastore and go to Configure Storage I/O Control:On the Configure Storage I/O Control window, select the checkbox Enable Storage I/O Control, set a custom congestion threshold (only if needed) and click OK to confirm the settings: With the Virtual Machine selected from the inventory, navigate to its Configure | General tab and review its datastore capability settings to ensure that SIOC is enabled:  How it works...As mentioned earlier, SIOC enables communication between these local host schedulers so that I/O can be balanced between virtual machines running on separate hosts. It does so by maintaining a shared file in the datastore that all hosts can read/write/update. When SIOC is enabled on a datastore, it starts monitoring the device latency on the LUN backing the datastore. If the latency crosses the threshold, it throttles the LUN's queue depth on each of the ESXi hosts in an attempt to distribute a fair share of access to the LUN for all the Virtual Machines issuing the I/O.The local scheduler on each of the ESXi hosts maintains an iostats file to keep its companion hosts aware of the device I/O statistics observed on the LUN. The file is placed in a directory (naa.xxxxxxxxx) on the same datastore.For example, if there are six virtual machines running on three different ESXi hosts, accessing a shared LUN. Among the six VMs, four of them have a normal share value of 1000 and the remaining two have high (2000) disk share value sets on them. These virtual machines have only a single VMDK attached to them. VM-C on host ESX-02 is issuing a large number of I/O operations. Since that is the only VM accessing the shared LUN from that host, it gets the entire queue's bandwidth. This can induce latency on the I/O operations performed by the other VMs: ESX-01 and ESX-03. If the SIOC detects the latency value to be greater than the dynamic threshold, then it will start throttling the queue depth: The throttled DQLEN for a VM is calculated as follows:DQLEN for the VM = (VM's Percent of Shares) of (Queue Depth)Example: 12.5 % of 64 → (12.5 * 64)/100 = 8The throttled DQLEN per host is calculated as follows:DQLEN of the Host = Sum of the DQLEN of the VMs on itExample: VM-A (8) + VM-B(16) = 24The following diagram shows the effect of SIOC throttling the queue depth: SummaryIn this article we learnt, how to mount a VMFS volume from a LUN detected as a snapshot, how to mount an NFS datasture using the NFSv4.1 client with Kerberos authentication enabled, and how to enable SIOC on a datastore. Resources for Article:   Further resources on this subject: Essentials of VMware vSphere [article] Working with VMware Infrastructure [article] Network Virtualization and vSphere [article]
Read more
  • 0
  • 0
  • 3182
article-image-kvm-networking-libvirt
Packt
05 Jul 2017
10 min read
Save for later

KVM Networking with libvirt

Packt
05 Jul 2017
10 min read
In this article by Konstantin Ivanov, author of the book KVM Virtualization Cookbook, we are going to deploy three different network types, explore the network XML format and see examples on how to define and manipulate virtual interfaces for the KVM instances. (For more resources related to this topic, see here.) To be able to connect the virtual machines to the host OS, or to each other, we are going to use the Linux bridge and the Open vSwitch (OVS) daemons, userspace tools and kernel modules. Both software bridging technologies are great at creating software defined networks (SDN) of various complexity, in a consistent and easy to manipulate manner. The Linux bridge and OVS both act as a bridge/switch that the virtual interfaces of the KVM guests can connect to. With all this in mind, lets start by learning more about the software bridges in Linux. The Linux bridge The Linux bridge is a software Layer 2 device that provides some of the functionality of a physical bridge device. It can forward frames between KVM guests, the host OS and virtual machines running on other servers, or networks. The Linux bridge consists of two components - a userspace administration tool that we are going to use in this recipe and a kernel module, that performs all the work of connecting multiple Ethernet segments together. Each software bridge we create can have a number of ports attached to it, where network traffic is forwarded to and from. When creating KVM instances we can attach the virtual interfaces that are associated with them to the bridge, which is similar to plugging a network cable from a physical server's Network Interface Card (NIC) to a bridge/switch device. Being a Layer 2 device, the Linux bridge works with MAC addresses and maintains a kernel structure to keep track of ports and associated MAC addresses in the form of a Content Addressable Memory (CAM) table. In this recipe we are going to create a new Linux bridge and use the brctl utility to manipulate it. Getting Ready For this recipe we are going to need the following: Recent Linux kernel with enabled 802.1d Ethernet Bridging options. To check if your kernel is compiled with those features, or exposed as kernel modules, run: root@kvm:~# cat /boot/config-`uname -r` | grep -i bridg # PC-card bridges CONFIG_BRIDGE_NETFILTER=y CONFIG_NF_TABLES_BRIDGE=m CONFIG_BRIDGE_EBT_BROUTE=m CONFIG_BRIDGE_EBT_T_FILTER=m CONFIG_BRIDGE_EBT_T_NAT=m CONFIG_BRIDGE_EBT_802_3=m CONFIG_BRIDGE_EBT_AMONG=m CONFIG_BRIDGE_EBT_ARP=m CONFIG_BRIDGE_EBT_IP=m CONFIG_BRIDGE_EBT_IP6=m CONFIG_BRIDGE_EBT_LIMIT=m CONFIG_BRIDGE_EBT_MARK=m CONFIG_BRIDGE_EBT_PKTTYPE=m CONFIG_BRIDGE_EBT_STP=m CONFIG_BRIDGE_EBT_VLAN=m CONFIG_BRIDGE_EBT_ARPREPLY=m CONFIG_BRIDGE_EBT_DNAT=m CONFIG_BRIDGE_EBT_MARK_T=m CONFIG_BRIDGE_EBT_REDIRECT=m CONFIG_BRIDGE_EBT_SNAT=m CONFIG_BRIDGE_EBT_LOG=m # CONFIG_BRIDGE_EBT_ULOG is not set CONFIG_BRIDGE_EBT_NFLOG=m CONFIG_BRIDGE=m CONFIG_BRIDGE_IGMP_SNOOPING=y CONFIG_BRIDGE_VLAN_FILTERING=y CONFIG_SSB_B43_PCI_BRIDGE=y CONFIG_DVB_DDBRIDGE=m CONFIG_EDAC_SBRIDGE=m # VME Bridge Drivers root@kvm:~# The bridge kernel module. To verify that the module is loaded and to obtain more information about its version and features, execute: root@kvm:~# lsmod | grep bridge bridge 110925 0 stp 12976 2 garp,bridge llc 14552 3 stp,garp,bridge root@kvm:~# modinfo bridge filename: /lib/modules/3.13.0-107-generic/kernel/net/bridge/bridge.ko alias: rtnl-link-bridge version: 2.3 license: GPL srcversion: 49D4B615F0B11CA696D8623 depends: stp,llc intree: Y vermagic: 3.13.0-107-generic SMP mod_unload modversions signer: Magrathea: Glacier signing key sig_key: E1:07:B2:8D:F0:77:39:2F:D6:2D:FD:D7:92:BF:3B:1D:BD:57:0C:D8 sig_hashalgo: sha512 root@kvm:~# The bridge-utils package, that provides the tool to create and manipulate the Linux bridge. The ability to create new KVM guests using libvirt, or the QEMU utilities How to do it... Install the Linux bridge package, if it is not already present: root@kvm:~# apt install bridge-utils Build a new KVM instance using the raw image: root@kvm:~# virt-install --name kvm1 --ram 1024 --disk path=/tmp/debian.img,format=raw --graphics vnc,listen=146.20.141.158 --noautoconsole --hvm --import Starting install... Creating domain... | 0 B 00:00 Domain creation completed. You can restart your domain by running: virsh --connect qemu:///system start kvm1 root@kvm:~# List all available bridge devices: root@kvm:~# brctl show bridge name bridge id STP enabled interfaces virbr0 8000.fe5400559bd6 yes vnet0 root@kvm:~# Bring the virtual bridge down, delete it and ensure it's been deleted: root@kvm:~# ifconfig virbr0 down root@kvm:~# brctl delbr virbr0 root@kvm:~# brctl show bridge name bridge id STP enabled interfaces root@kvm:~# Create a new bridge and bring it up: root@kvm:~# brctl addbr virbr0 root@kvm:~# brctl show bridge name bridge id STP enabled interfaces virbr0 8000.000000000000 no root@kvm:~# ifconfig virbr0 up root@kvm:~# Assign an IP address to the bridge: root@kvm:~# ip addr add 192.168.122.1 dev virbr0 root@kvm:~# ip addr show virbr0 39: virbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default link/ether 32:7d:3f:80:d7:c6 brd ff:ff:ff:ff:ff:ff inet 192.168.122.1/32 scope global virbr0 valid_lft forever preferred_lft forever inet6 fe80::307d:3fff:fe80:d7c6/64 scope link valid_lft forever preferred_lft forever root@kvm:~# List the virtual interfaces on the host OS: root@kvm:~# ip a s | grep vnet 38: vnet0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN group default qlen 500 root@kvm:~# Add the virtual interface vnet0 to the bridge: root@kvm:~# brctl addif virbr0 vnet0 root@kvm:~# brctl show virbr0 bridge name bridge id STP enabled interfaces virbr0 8000.fe5400559bd6 no vnet0 root@kvm:~# Enable the Spanning Tree Protocol (STP) on the bridge and obtain more information: root@kvm:~# brctl stp virbr0 on root@kvm:~# brctl showstp virbr0 virbr0 bridge id 8000.fe5400559bd6 designated root 8000.fe5400559bd6 root port 0 path cost 0 max age 20.00 bridge max age 20.00 hello time 2.00 bridge hello time 2.00 forward delay 15.00 bridge forward delay 15.00 ageing time 300.00 hello timer 0.26 tcn timer 0.00 topology change timer 0.00 gc timer 90.89 flags vnet0 (1) port id 8001 state forwarding designated root 8000.fe5400559bd6 path cost 100 designated bridge 8000.fe5400559bd6 message age timer 0.00 designated port 8001 forward delay timer 0.00 designated cost 0 hold timer 0.00 flags root@kvm:~# Test connectivity from the KVM instance to the host OS: root@kvm:~# virsh console kvm1 Connected to domain kvm1 Escape character is ^] root@debian:~# ip a s eth0 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 link/ether 52:54:00:55:9b:d6 brd ff:ff:ff:ff:ff:ff inet 192.168.122.92/24 brd 192.168.122.255 scope global eth0 valid_lft forever preferred_lft forever inet6 fe80::5054:ff:fe55:9bd6/64 scope link valid_lft forever preferred_lft forever root@debian:~# root@debian:~# ping 192.168.122.1 -c 3 PING 192.168.122.1 (192.168.122.1) 56(84) bytes of data. 64 bytes from 192.168.122.1: icmp_seq=1 ttl=64 time=0.276 ms 64 bytes from 192.168.122.1: icmp_seq=2 ttl=64 time=0.226 ms 64 bytes from 192.168.122.1: icmp_seq=3 ttl=64 time=0.259 ms --- 192.168.122.1 ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 1999ms rtt min/avg/max/mdev = 0.226/0.253/0.276/0.027 ms root@debian:~# How it works... When we first installed and started the libvirt daemon, few things happened automatically: A new Linux bridge was created with the name and IP address defined in the /etc/libvirt/qemu/networks/default.xml configuration file. The dnsmasq service was started with a configuration specified in the /var/lib/libvirt/dnsmasq/default.conf file. Lets examine the default libvirt bridge configuration: root@kvm:~# cat /etc/libvirt/qemu/networks/default.xml <network> <name>default</name> <bridge name="virbr0"/> <forward/> <ip address="192.168.122.1" netmask="255.255.255.0"> <dhcp> <range start="192.168.122.2" end="192.168.122.254"/> </dhcp> </ip> </network> root@kvm:~# This is the default network that libvirt created for us, specifying the bridge name, IP address and the IP range used by the DHCP server that was started. We are going to talk about libvirt networking in much more details later in this article, however we are showing it here to help you understand where all the IP addresses and the bridge name came from.We can see that a DHCP server is running on the host OS and its configuration file, by running: root@kvm:~# pgrep -lfa dnsmasq 38983 /usr/sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/default.conf root@kvm:~# cat /var/lib/libvirt/dnsmasq/default.conf ##WARNING: THIS IS AN AUTO-GENERATED FILE. CHANGES TO IT ARE LIKELY TO BE ##OVERWRITTEN AND LOST. Changes to this configuration should be made using: ## virsh net-edit default ## or other application using the libvirt API. ## ## dnsmasq conf file created by libvirt strict-order user=libvirt-dnsmasq pid-file=/var/run/libvirt/network/default.pid except-interface=lo bind-dynamic interface=virbr0 dhcp-range=192.168.122.2,192.168.122.254 dhcp-no-override dhcp-leasefile=/var/lib/libvirt/dnsmasq/default.leases dhcp-lease-max=253 dhcp-hostsfile=/var/lib/libvirt/dnsmasq/default.hostsfile addn-hosts=/var/lib/libvirt/dnsmasq/default.addnhosts root@kvm:~# From the preceding configuration file, notice how the IP address range for the DHCP service and the name of the virtual bridge match what is configured in the default libvirt network file that we just saw. With all this in mind let's step through all the actions we performed earlier: In step 1, we installed the userspace tool brctl that we use to create, configure and inspect the Linux bridge configuration in the Linux kernel. In step 2, we provisioned a new KVM instance using a custom raw image containing the guest OS. In step 3, we invoked the bridge utility to list all available bridge devices. From the output we can observe that currently there's one bridge, named virbr0, which libvirt created automatically. Notice that under the interfaces column we can see the vnet0 interface. This is the virtual NIC that was exposed to the host OS, when we started the KVM instance. This means that the virtual machine is connected to the host bridge. In step 4, we first bring the bridge down in order to delete it, then we use the brctl command again to remove the bridge and ensure it's not present on the host OS. In step 5, we recreated the bridge and brought it back up. We do this to demonstrate the steps required to create a new bridge. In step 6, we re-assigned the same IP address to the bridge and listed it. In steps 7 and 8 we list all virtual interfaces on the host OS. Since we only have one KVM guest currently running on the server, we only see one virtual interface - vnet0. We then proceed to add/connect the virtual NIC to the bridge. In step 9, we enabled the Spanning Tree Protocol (STP) on the bridge. STP is a layer 2 protocol that helps prevent network loops if we have redundant network paths. This is especially useful in larger, more complex network topologies, where multiple bridges are connected together. Finally, in step 10, we connect to the KVM guest using the console, list its interface configuration and ensure we can ping the bridge on the host OS. Notice that the IP address of the guest OS was automatically assigned by the dnsmasq server running on the host, as it is a part of the IP range defined in the configuration file we saw earlier. Summary We have covered the Linux bridge in detail here. We have deployed three different network types, explored the network XML format and have seen examples on how to define and manipulate virtual interfaces for the KVM instances. Resources for Article: Further resources on this subject: A Virtual Machine for a Virtual World [article] Getting Started with Ansible [article] Supporting hypervisors by OpenNebula [article]
Read more
  • 0
  • 0
  • 18911

article-image-introduction-sdn-transformation-legacy-sdn
Packt
07 Jun 2017
23 min read
Save for later

Introduction to SDN - Transformation from legacy to SDN

Packt
07 Jun 2017
23 min read
In this article, by Reza Toghraee, the author of the book, Learning OpenDayLight, we will: What is and what is not SDN Components of a SDN Difference between SDN and overlay The SDN controllers (For more resources related to this topic, see here.) You might have heard about Software-Defined Networking (SDN). If you are in networking industry this is a topic which probably you have studied initially when first time you heard about the SDN.To understand the importance of SDN and SDN controller, let's look at Google. Google silently built its own networking switches and controller called Jupiter. A home grown project which is mostly software driven and supports such massive scale of Google. The SDN base is There is a controller who knows the whole network. OpenDaylight (ODL), is a SDN controller. In other words, it's the central brain of the network. Why we are going towards SDN Everyone who is hearing about SDN, should ask this question that why are we talking about SDN. What problem is it trying to solve? If we look at traditional networking (layer 2, layer 3 with routing protocols such as BGP, OSPF) we are completely dominated by what is so called protocols. These protocols in fact have been very helpful to the industry. They are mostly standard. Different vendor and products can communicate using standard protocols with each other. A Cisco router can establish a BGP session with a Huawei switch or an open source Quagga router can exchange OSPF routes with a Juniper firewall. Routing protocol is a constant standard with solid bases. If you need to override something in your network routing, you have to find a trick to use protocols, even by using a static route. SDN can help us to come out of routing protocol cage, look at different ways to forward traffic. SDN can directly program each switch or even override a route which is installed by routing protocol. There are high-level benefits of using SDN which we explain few of them as follows: An integrated network: We used to have a standalone concept in traditional network. Each switch was managed separately, each switch was running its own routing protocol instance and was processing routing information messages from other neighbors. In SDN, we are migrating to a centralized model, where the SDN controller becomes the single point of configuration of the network, where you will apply the policies and configuration. Scalable layer 2 across layer 3:Having a layer 2 network across multiple layer 3 network is something which all network architects are interested and till date we have been using proprietary methods such as OTV or by using a service provider VPLS service. With SDN, we can create layer 2 networks across multiple switches or layer 3 domains (using VXLAN) and expand the layer 2 networks. In many cloud environment, where the virtual machines are distributed across different hosts in different datacenters, this is a major requirement. Third-party application programmability: This is a very generic term, isn't it? But what I'm referring to is to let other applications communicate with your network. For example,In many new distributed IP storage systems, the IP storage controller has ability to talk to network to provide the best, shortest path to the storage node. With SDN we are letting other applications to control the network. Of course this control has limitation and SDN doesn't allow an application to scrap the whole network. Flexible application based network:In SDN, everything is an application. L2/L3, BGP, VMware Integration, and so on all are applications running in SDN controller. Service chaining:On the fly you add a firewall in the path or a load balancer. This is service insertion. Unified wired and wireless: This is an ideal benefit, to have a controller which supports both wired and wireless network. OpenDaylight is the only controller which supports CAPWAP protocols which allows integration with wireless access points. Components of a SDN A software defined network infrastructure has two main key components: The SDN Controller (only one, could be deployed in a highly available cluster) The SDN enabled switches(multiple switches, mostly in a Clos topology in a datacenter):   SDN controller is the single brain of the SDN domain. In fact, an SDN domain is very similar to a chassis based switch. You can imagine supervisor or management module of a chassis based switch as a SDN controller and rest of the line card and I/O cards as SDN switches. The main difference between a SDN network and a chassis based switch is that you can scale out the SDN with multiple switches, where in a chassis based switch you are limited to the number of slots in that chassis: Controlling the fabric It is very important that you understand the main technologies involved in SDN. These methods are used by SDN controllers to manage and control the SDN network. In general, there are two methods available for controlling the fabric: Direct fabric programming: In this method, SDN controller directly communicates with SDN enabled switches via southbound protocols such as OpenFlow, NETCONF and OVSDB. SDN controller programs each switch member with related information about fabric, and how to forward the traffic. Direct fabric programming is the method used by OpenDaylight. Overlay:In overlay method, SDN controller doesn't rely on programing the network switches and routers. Instead it builds an virtual overlay network on top of existing underlay network. The underlay network can be a L2 or L3 network with traditional network switches and router, just providing IP connectivity. SDN controller uses this platform to build the overlay using encapsulation protocols such as VXLAN and NVGRE. VMware NSX uses overlay technology to build and control the virtual fabric. SDN controllers One of the key fundamentals of SDN is disaggregation. Disaggregation of software and hardware in a network and also disaggregation of control and forwarding planes. SDN controller is the main brain and controller of an SDN environment, it's a strategic control point within the network and responsible for communicating information to: Routers and switches and other network devices behind them. SDN controllers uses APIs or protocols (such as OpenFlow or NETCONF) to communicate with these devices. This communication is known as southbound Upstream switches, routers or applications and the aforementioned business logic (via APIs or protocols). This communication is known as northbound. An example for a northbound communication is a BGP session between a legacy router and SDN controller. If you are familiar with chassis based switches like Cisco Catalyst 6500 or Nexus 7k chassis, you can imagine a SDN network as a chassis, with switches and routers as its I/O line cards and SDN controller as its supervisor or management module. Infact SDN is similar to a very scalable chassis where you don't have any limitation on number of physical slots. SDN controller is similar to role of management module of a chassis based switch and it controls all switches via its southbound protocols and APIs. The following table compares the SDN controller and a chassis based switch:  SDN Controller Chassis based switch Supports any switch hardware Supports only specific switch line cards Can scale out, unlimited number of switches Limited to number of physical slots in the chassis Supports high redundancy by multiple controllers in a cluster Supports dual management redundancy, active standby Communicates with switches via southbound protocols such as OpenFlow, NETCONF, BGP PCEP Use proprietary protocols between management module and line cards Communicates with routers, switches and applications outside of SDN via northbound protocols such as BGP, OSPF and direct API Communicates with other routers and switches outside of chassis via standard protocols such as BGP, OSPF or APIs. The first protocol that popularized the concept behind SDN was OpenFlow. When conceptualized by networking researchers at Stanford back in 2008, it was meant to manipulate the data plane to optimize traffic flows and make adjustments, so the network could quickly adapt to changing requirements. Version 1.0 of the OpenFlow specification was released in December of 2009; it continues to be enhanced under the management of the Open Networking Foundation, which is a user-led organization focused on advancing the development of open standards and the adoption of SDN technologies. OpenFlow protocol was the first protocol that helped in popularizing SDN. OpenFlow is a protocol designed to update the flow tables in a switch. Allowing the SDN controller to access the forwarding table of each member switch or in other words to connect control plane and data plane in SDN world. Back in 2008, OpenFlow conceptualized by networking researchers at Stanford University, the initial use of OpenFlow was to alter the switch forwarding tables to optimize traffic flows and make adjustments, so the network could quickly adapt to changing requirements. After introduction of OpenFlow, NOX introduced as original OpenFlow controller (still there wasn't concept of SDN controller). NOX was providing a high-level API capable of managing and also developing network control applications. Separate applications were required to run on top of NOX to manage the network.NOX was initially developed by Nicira networks (which acquired by VMware, and finally became part of VMware NSX). NOX introduced along with OpenFlow in 2009. NOX was a closed source product but ultimately it was donated to SDN community which led to multiple forks and sub projects out of original NOX. For example, POX is a sub project of NOX which provides Python support. Both NOX and POX were early controllers. NOX appears an inactive development, however POX is still in use by the research community as it is a Python based project and can be easily deployed. POX is hosted at http://github.com/noxrepo/pox NOX apart from being the first OpenFlow or SDN controller also established a programing model which inherited by other subsequent controllers. The model was based on processing of OpenFlow messages, with each incoming OpenFlow message trigger an event that had to be processed individually. This model was simple to implement but not efficient and robust and couldn't scale. Nicira along with NTT and Google started developing ONIX, which was meant to be a more abstract and scalable for large deployments. ONIX became the base for Nicira (the core of VMware NSX or network virtualization platform) also there are rumors that it is also the base for Google WAN controller. ONIX was planned to become open source and donated to community but for some reasons the main contributors decided to not to do it which forced the SDN community to focus on developing other platforms. Started in 2010, a new controller introduced,the Beacon controller and it became one of the most popular controllers. It born with contribution of developers from Stanford University. Beacon is a Java-based open source OpenFlow controller created in 2010. It has been widely used for teaching, research, and as the basis of Floodlight. Beacon had the first built-in web user-interface which was a huge step forward in the market of SDN controllers. Also it provided a easier method to deploy and run compared to NOX. Beacon was an influence for design of later controllers after it, however it was only supporting star topologies which was one of the limitations on this controller. Floodlight was a successful SDN controller which was built as a fork of Beacon. BigSwitch networks is developing Floodlight along with other developers. In 2013, Beacon popularity started to shrink down and Floodlight started to gain popularity. Floodlight had fixed many issues of Beacon and added lots of additional features which made it one of the most feature rich controllers available. It also had a web interface, a Java-based GUI and also could get integrated with OpenStack using quantum plugin. Integration with OpenStack was a big step forward as it could be used to provide networking to a large pool of virtual machines, compute and storage. Floodlight adoption increased by evolution of OpenStack and OpenStack adopters. This gave Floodlight greater popularity and applicability than other controllers that came before. Most of controllers came after Floodlight also supported OpenStack integration. Floodlight is still supported and developed by community and BigSwitch networks, and is a base for BigCloud Fabric (the BigSwitch's commercial SDN controller). There are other open source SDN controllers which introduced such as Trema (ruby-based from NEC), Ryu (supported by NTT), FlowER, LOOM and the recent OpenMUL. The following table shows the current open source SDN controllers:  Active open source SDNcontroller Non-active open source SDN controllers Floodlight Beacon OpenContrail FlowER OpenDaylight NOX LOOM NodeFlow OpenMUL   ONOS   POX   Ryu   Trema     OpenDaylight OpenDaylight started in early 2013, and was originally led by IBM and Cisco. It was a new collaborative open source project. OpenDaylight hosted under Linux Foundation and draw support and interest from many developers and adopters. OpenDaylight is a platform to provide common foundations and a robust array of services for SDN environments. OpenDaylight uses a controller model which supports OpenFlow as well as other southbound protocols. It is the first open source controller capable of employing non-OpenFlow proprietary control protocols which eventually lets OpenDaylight to integrate with modern and multi-vendor networks. The first release of OpenDaylight in February 2014 with code name of Hydrogen, followed by Helium in September 2014. The Helium release was significant because it marked a change in direction for the platform that has influenced the way subsequent controllers have been architected. The main change was in the service abstraction layer, which is the part of the controller platform that resides just above the southbound protocols, such as OpenFlow, isolating them from the northbound side and where the applications reside. Hydrogen used an API-driven Service Abstraction Layer (AD-SAL), which had limitations specifically, it meant the controller needed to know about every type of device in the network AND have an inventory of drivers to support them. Helium introduced a Model-driven service abstraction layer (MD-SAL), which meant the controller didn't have to account for all the types of equipment installed in the network, allowing it to manage a wide range of hardware and southbound protocols. Helium release made the framework much more agile and adaptable to changes in the applications; an application could now request changes to the model, which would be received by the abstraction layer and forwarded to the network devices. The OpenDaylight platform built on this advancement in its third release, Lithium, which was introduced in June of 2015. This release focused on broadening the programmability of the network, enabling organizations to create their own service architectures to deliver dynamic network services in a cloud environment and craft intent-based policies. Lithium release was worked on by more than 400 individuals, and contributions from Big Switch Networks, Cisco, Ericsson, HP, NEC, and so on, making it one of the fastest growing open source projects ever. The fourth release, Beryllium come out in February of 2016 and the most recent fifth release, Boron released in September 2016. Many vendors have built and developed commercial SDN controller solutions based on OpenDaylight. Each product has enhanced or added features to OpenDaylight to have some differentiating factor. The use of OpenDaylight in different vendor products are: A base, but sell a commercial version with additional proprietary functionality—for example: Brocade, Ericsson, Ciena, and so on. Part of their infrastructure in their Network as a Service (or XaaS) offerings—for example: Telstra, IBM, and so on. Elements for use in their solution—for example: ConteXtream (now part of HP) Open Networking Operating System (ONOS), which was open sourced in December 2014 is focused on serving the needs of service providers. It is not as widely adopted as OpenDaylight, ONOS has been finding success and gaining momentum around WAN use cases. ONOS is backed by numerous organizations including AT&T, Cisco, Fujitsu, Ericsson, Ciena, Huawei, NTT, SK Telecom, NEC, and Intel, many of whom are also participants in and supporters of OpenDaylight. Apart from open source SDN controllers, there are many commercial, proprietary controllers available in the market. Products such as VMware NSX, Cisco APIC, BigSwitch Big Cloud Fabric, HP VAN and NEC ProgrammableFlow are example commercial and proprietary products. The following table lists the commercially available controllers and their relationship to OpenDaylight:  ODL-based ODL-friendly Non-ODL based Avaya Cyan (acquired by Ciena) BigSwitch Brocade HP Juniper Ciena NEC Cisco ConteXtream (HP) Nuage Plexxi Coriant   PLUMgrid Ericsson Pluribus Extreme Sonus Huawei (also ships non-ODL controller) VMware NSX Core features of SDN Regardless of an open source or a proprietary SDN platform, there are core features and capabilities which requires the SDN platform to support. These capabilities include: Fabric programmability:Providing the ability to redirect traffic, apply filters to packets (dynamically), and leverage templates to streamline the creation of custom applications. Ensuring northbound APIs allow the control information centralized in the controller available to be changed by SDN applications. This will ensure the controller can dynamically adjust the underlying network to optimize traffic flows to use the least expensive path, take into consideration varying bandwidth constraints, meet quality of service (QoS) requirements. Southbound protocol support:Enabling the controller to communicate to switches and routers and manipulate and optimize how they manage the flow of traffic. Currently OpenFlow is the most standard protocol used between different networking vendors, while there are other southbound protocols that can be used. A SDN platform should support different versions of OpenFlow in order to provide compatibility with different switching equipments. External API support:Ensuring the controller can be used within the varied orchestration and cloud environments such as VMware vSphere, OpenStack, and so on. Using APIs the orchestration platform can communicate with SDN platform in order to publish network policies. For example VMware vSphere shall talk to SDN platform to extend the virtual distributed switches(VDS) from virtual environment to the physical underlay network without any requirement form an network engineer to configure the network. Centralized monitoring and visualization:Since SDN controller has a full visibility over the network, it can offer end-to-end visibility of the network and centralized management to improve overall performance, simplify the identification of issues and accelerate troubleshooting. The SDN controller will be able to discover and present a logical abstraction of all the physical links in the network, also it can discover and present a map of connected devices (MAC addresses) which are related to virtual or physical devices connected to the network. The SDN controller support monitoring protocols, such as syslog, snmp and APIs in order to integrate with third-party management and monitoring systems. Performance: Performance in a SDN environment mainly depends on how fast SDN controller fills the flow tables of SDN enabled switches. Most of SDN controllers pre-populate the flow tables on switches to minimize the delay. When a SDN enabled switch receives a packet which doesn't find a matching entry in its flow table, it sends the packet to the SDN controller in order to find where the packet needs to get forwarded to. A robust SDN solution should ensure that the number of requests form switches are minimum and SDN controller doesn't become a bottleneck in the network. High availability and scalability: Controllers must support high availability clusters to ensure reliability and service continuity in case of failure of a controller. Clustering in SDN controller expands to scalability. A modern SDN platform should support scalability in order to add more controller nodes with load balancing in order to increase the performance and availability. Modern SDN controllers support clustering across multiple different geographical locations. Security:Since all switches communicate with SDN controller, the communication channel needs to be secured to ensure unauthorized devices doesn't compromise the network. SDN controller should secure the southbound channels, use encrypted messaging and mutual authentication to provide access control. Apart from that the SDN controller must implement preventive mechanisms to prevent from denial of services attacks. Also deployment of authorization levels and level controls for multi-tenant SDN platforms is a key requirement. Apart from the aforementioned features SDN controllers are likely to expand their function in future. They may become a network operating system and change the way we used to build networks with hardware, switches, SFPs and gigs of bandwidth. The future will look more software defined, as the silicon and hardware industry has already delivered their promises for high performance networking chips of 40G, 100G. Industry needs more time to digest the new hardware and silicons and refresh the equipment with new gears supporting 10 times the current performance. Current SDN controllers In this section, I'm putting the different SDN controllers in a table. This will help you to understand the current market players in SDN and how OpenDaylight relates to them:  Vendors/product Based on OpenDaylight? Commercial/open source Description Brocade SDN controller Yes Commercial It's a commercial version of OpenDaylight, fully supported and with extra reliable modules. Cisco APIC No Commercial Cisco Application Policy Infrastructure Controller (APIC) is the unifying automation and management point for the Application Centric Infrastructure (ACI) data center fabric. Cisco uses APIC controller and Nexus 9k switches to build the fabric. Cisco uses OpFlex as main southbound protocol. Erricson SDN controller Yes Commercial Ericsson's SDN controller is a commercial (hardened) version OpenDaylight SDN controller. Domain specific control applications that use the SDN controller as platform form the basis of the three commercial products in our SDN controller portfolio. Juniper OpenContrial /Contrail No Both OpenContrail is opensource, and Contrail itself is a commercial product. Juniper Contrail Networking is an open SDN solution that consists of Contrail controller, Contrail vRouter, an analytics engine, and published northbound APIs for cloud and NFV. OpenContrail is also available for free from Juniper. Contrail promotes and use MPLS in datacenter. NEC Programmable Flow No Commercial NEC provides its own SDN controller and switches. NEC SDN platform is one of choices of enterprises and has lots of traction and some case studies.   Avaya SDN Fx controller Yes Commercial Based on OpenDaylight, bundled as a solution package.   Big Cloud Fabric No Commercial BigSwitch networks solution is based on Floodlight opensource project. BigCloud Fabric is a robust, clean SDN controller and works with bare metal whitebox switches. BigCloud Fabric includes SwitchLightOS which is a switch operating system can be loaded on whitebox switches with Broadcom Trident 2 or Tomahawk silicons. The benefit of BigCloud Fabric is that you are not bound to any hardware and you can use baremetal switches from any vendor.   Ciena's Agility Yes Commercial Ciena's Agility multilayer WAN controller is built atop the open-source baseline of the OpenDaylight Project—an open, modular framework created by a vendor-neutral ecosystem (rather than a vendor-centric ego-system) that will enable network operators to source network services and applications from both Ciena's Agility and others. HP VAN (Virtual Application Network) No Commercial The building block of the HP open SDN ecosystem, the controller allows third-party developers to deliver innovative SDN solutions. Huawei Agile controller Yes and No (based on editions) Commercial Huawei's SDN controller which integrates as a solution with Huawei enterprise switches Nuage No Commercial Nuage Networks VSP provides SDN capabilities for clouds of all sizes. It is implemented as a non-disruptive overlay for all existing virtualized and non-virtualized server and network resources. Pluribus Netvisor No Commercial Netvisor Premium and Open Netvisor Linux are distributed network operating systems. Open Netvisor integrates atraditional, interoperable networking stack (L2/L3/VXLAN) with an SDN distributed controller that runs in everyswitch of the fabric. VMware NSX No Commercial VMware NSX is an Overlay type of SDN, which currently works with VMware vSphere. The plan is to support OpenStack in future. VMware NSX also has built-in firewall, router and L4 load balancers allowing micro segmentation. OpenDaylight as an SDN controller Previously, we went through the role of SDN controller, and a brief history of ODL.ODL is a modular open SDN platform which allows developers to build any network or business application on top of it to drive the network in the way they want. Currently OpenDaylight has reached to its fifth release (Boron, which is the fifth element in periodic table). ODL releases are named based on periodic table elements, started from first release the Hydrogen. ODL has a 6 month release period, with many developers working on expanding the ODL, 2 releases per year is expected from community. For technical readers to understand it more clearly, the following diagram will help: ODL platform has a broad set of use cases for multivendor, brown field, green fields, service providers and enterprises. ODL is a foundation for networks of the future. Service providers are using ODL to migrate their services to a software enabled level with automatic service delivery and coming out of circuit-based mindset of service delivery. Also they work on providing a virtualized CPE with NFV support in order to provide flexible offerings. Enterprises use ODL for many use cases, from datacenter networking, Cloud and NFS, network automation and resource optimization, visibility, control to deploying a fully SDN campus network. ODL uses a MD-SAL which makes it very scalable and lets it incorporate new applications and protocols faster. ODL supports multiple standard and proprietary southbound protocols, for example with full support of OpenFlow and OVSDB, ODL can communicate with any standard hardware (or even the virtual switches such as Open vSwitch(OVS) supporting such protocols). With such support, ODL can be deployed and used in multivendor environments and control hardware from different vendors from a single console no matter what vendor and what device it is, as long as they support standard southbound protocols. ODL uses a micro service architecture model which allows users to control applications, protocols and plugins while deploying SDN applications. Also ODL is able to manage the connection between external consumers and providers. The followingdiagram explains the ODL footprint and different components and projects within the ODL: Micro servicesarchitecture ODL stores its YANG data structure in a common data store and uses messaging infrastructure between different components to enable a model-driven approach to describe the network and functions. In ODL MD-SAL, any SDN application can be integrated as a service and then loaded into the SDN controller. These services (apps) can be chained together in any number and ways to match the application needs. This concept allows users to only install and enable the protocols and services they need which makes the system light and efficient. Also services and applications created by users can be shared among others in the ecosystem since the SDN application deployment for ODL follows a modular design. ODL supports multiple southbound protocols. OpenFlow and OpenFlow extension such as Table Type Patterns (TTP), as well as other protocols including NETCONF, BGP/PCEP, CAPWAP and OVSDB. Also ODL supports Cisco OpFlex protocol: ODL platform provides a framework for authentication, authorization and accounting (AAA), as well as automatic discovery and securing of network devices and controllers. Another key area in security is to use encrypted and authenticated communication trough southbound protocols with switches and routers within the SDN domain. Most of southbound protocols support security encryption mechanisms. Summary In this article we learned about SDN, and why it is important. We reviewed the SDN controller products, the ODL history as well as core features of SDN controllers and market leader controllers. We managed to dive in some details about SDN . Resources for Article: Further resources on this subject: Managing Network Devices [article] Setting Up a Network Backup Server with Bacula [article] Point-to-Point Networks [article]
Read more
  • 0
  • 0
  • 6057

article-image-learning-basic-powercli-concepts
Packt
05 Jan 2017
7 min read
Save for later

Learning Basic PowerCLI Concepts

Packt
05 Jan 2017
7 min read
In this article, by Robert van den Nieuwendijk, author of the book Learning PowerCLI - Second Edition, you will learn some basic PowerShell and PowerCLI concepts. Knowing these concepts will make it easier for you to learn the advanced topics. We will cover the Get-Command, Get-Help, and Get-Member cmdlets in this article. (For more resources related to this topic, see here.) Using the Get-Command, Get-Help, and Get-Member cmdlets There are some PowerShell cmdlets that everyone should know. Knowing these cmdlets will help you discover other cmdlets, their functions, parameters, and returned objects. Using Get-Command The first cmdlet that you should know is Get-Command. This cmdlet returns all the commands that are installed on your computer. The Get-Command cmdlet has the following syntax: Get-Command [[-ArgumentList] <Object[]>] [-All] [-ListImported] [-Module <String[]>] [-Noun <String[]>] [-ParameterName <String[]>] [-ParameterType <PSTypeName[]>] [-Syntax] [-TotalCount <Int32>] [-Verb <String[]>] [<CommonParameters>] Get-Command [[-Name] <String[]>] [[-ArgumentList] <Object[]>] [-All] [-CommandType <CommandTypes>] [-ListImported] [-Module <String[]>] [-ParameterName <String[]>] [-ParameterType <PSTypeName[]>] [-Syntax] [-TotalCount <Int32>] [<CommonParameters>] The first parameter set is named CmdletSet, and the second parameter set is named AllCommandSet. If you type the following command, you will get a list of commands installed on your computer, including cmdlets, aliases, functions, workflows, filters, scripts, and applications: PowerCLI C:> Get-Command You can also specify the name of a specific cmdlet to get information about that cmdlet, as shown in the following command: PowerCLI C:> Get-Command –Name Get-VM This will return the following information about the Get-VM cmdlet: CommandType Name ModuleName ----------- ---- ---------- Cmdlet Get-VM VMware.VimAutomation.Core You see that the command returns the command type and the name of the module that contains the Get-VM cmdlet. CommandType, Name, and ModuleName are the properties that the Get-VM cmdlet returns by default. You will get more properties if you pipe the output to the Format-List cmdlet. The following screenshot will show you the output of the Get-Command –Name Get-VM | Format-List * command: You can use the Get-Command cmdlet to search for cmdlets. For example, if necessary, search for the cmdlets that are used for vSphere hosts. Type the following command: PowerCLI C:> Get-Command -Name *VMHost* If you are searching for the cmdlets to work with networks, use the following command: PowerCLI C:> Get-Command -Name *network* Using Get-VICommand PowerCLI has a Get-VICommand cmdlet that is similar to the Get-Command cmdlet. The Get-VICommand cmdlet is actually a function that creates a filter on the Get-Command output, and it returns only PowerCLI commands. Type the following command to list all the PowerCLI commands: PowerCLI C:> Get-VICommand The Get-VICommand cmdlet has only one parameter –Name. So, you can also type, for example, the following command to get information only about the Get-VM cmdlet: PowerCLI C:> Get-VICommand –Name Get-VM Using Get-Help To discover more information about cmdlets, you can use the Get-Help cmdlet. For example: PowerCLI C:> Get-Help Get-VM This will display the following information about the Get-VM cmdlet: The Get-Help cmdlet has some parameters that you can use to get more information. The –Examples parameter shows examples of the cmdlet. The –Detailed parameter adds parameter descriptions and examples to the basic help display. The –Full parameter displays all the information available about the cmdlet. And the –Online parameter retrieves online help information available about the cmdlet and displays it in a web browser. Since PowerShell V3, there is a new Get-Help parameter –ShowWindow. This displays the output of Get-Help in a new window. The Get-Help -ShowWindow command opens the following screenshot: Using Get-PowerCLIHelp The PowerCLI Get-PowerCLIHelp cmdlet opens a separate help window for PowerCLI cmdlets, PowerCLI objects, and articles. This is a very useful tool if you want to browse through the PowerCLI cmdlets or PowerCLI objects. The following screenshot shows the window opened by the Get-PowerCLIHelp cmdlet: Using Get-PowerCLICommunity If you have a question about PowerCLI and you cannot find the answer in this article, use the Get-PowerCLICommunity cmdlet to open the VMware vSphere PowerCLI section of the VMware VMTN Communities. You can log in to the VMware VMTN Communities using the same My VMware account that you used to download PowerCLI. First, search the community for an answer to your question. If you still cannot find the answer, go to the Discussions tab and ask your question by clicking on the Start a Discussion button, as shown later. You might receive an answer to your question in a few minutes. Using Get-Member In PowerCLI, you work with objects. Even a string is an object. An object contains properties and methods, which are called members in PowerShell. To see which members an object contains, you can use the Get-Member cmdlet. To see the members of a string, type the following command: PowerCLI C:> "Learning PowerCLI" | Get-Member Pipe an instance of a PowerCLI object to Get-Member to retrieve the members of that PowerCLI object. For example, to see the members of a virtual machine object, you can use the following command: PowerCLI C:> Get-VM | Get-Member TypeName: VMware.VimAutomation.ViCore.Impl.V1.Inventory.VirtualMachineImpl Name MemberType Definition ---- ---------- ---------- ConvertToVersion Method T VersionedObjectInterop.Conver... Equals Method bool Equals(System.Object obj) GetConnectionParameters Method VMware.VimAutomation.ViCore.Int... GetHashCode Method int GetHashCode() GetType Method type GetType() IsConvertableTo Method bool VersionedObjectInterop.IsC... LockUpdates Method void ExtensionData.LockUpdates() ObtainExportLease Method VMware.Vim.ManagedObjectReferen... ToString Method string ToString() UnlockUpdates Method void ExtensionData.UnlockUpdates() CDDrives Property VMware.VimAutomation.ViCore.Typ... Client Property VMware.VimAutomation.ViCore.Int... CustomFields Property System.Collections.Generic.IDic... DatastoreIdList Property string[] DatastoreIdList {get;} Description Property string Description {get;} DrsAutomationLevel Property System.Nullable[VMware.VimAutom... ExtensionData Property System.Object ExtensionData {get;} FloppyDrives Property VMware.VimAutomation.ViCore.Typ... Folder Property VMware.VimAutomation.ViCore.Typ... FolderId Property string FolderId {get;} Guest Property VMware.VimAutomation.ViCore.Typ... GuestId Property string GuestId {get;} HAIsolationResponse Property System.Nullable[VMware.VimAutom... HardDisks Property VMware.VimAutomation.ViCore.Typ... HARestartPriority Property System.Nullable[VMware.VimAutom... Host Property VMware.VimAutomation.ViCore.Typ... HostId Property string HostId {get;} Id Property string Id {get;} MemoryGB Property decimal MemoryGB {get;} MemoryMB Property decimal MemoryMB {get;} Name Property string Name {get;} NetworkAdapters Property VMware.VimAutomation.ViCore.Typ... Notes Property string Notes {get;} NumCpu Property int NumCpu {get;} PersistentId Property string PersistentId {get;} PowerState Property VMware.VimAutomation.ViCore.Typ... ProvisionedSpaceGB Property decimal ProvisionedSpaceGB {get;} ResourcePool Property VMware.VimAutomation.ViCore.Typ... ResourcePoolId Property string ResourcePoolId {get;} Uid Property string Uid {get;} UsbDevices Property VMware.VimAutomation.ViCore.Typ... UsedSpaceGB Property decimal UsedSpaceGB {get;} VApp Property VMware.VimAutomation.ViCore.Typ... Version Property VMware.VimAutomation.ViCore.Typ... VMHost Property VMware.VimAutomation.ViCore.Typ... VMHostId Property string VMHostId {get;} VMResourceConfiguration Property VMware.VimAutomation.ViCore.Typ... VMSwapfilePolicy Property System.Nullable[VMware.VimAutom... The command returns the full type name of the VirtualMachineImpl object and all its methods and properties. Remember that the properties are objects themselves. You can also use Get-Member to get the members of the properties. For example, the following command line will give you the members of the VMGuestImpl object: PowerCLI C:> $VM = Get-VM –Name vCenter PowerCLI C:> $VM.Guest | Get-Member Summary In this article, you looked at the Get-Help, Get-Command, and Get-Member cmdlets. Resources for Article: Further resources on this subject: Enhancing the Scripting Experience [article] Introduction to vSphere Distributed switches [article] Virtualization [article]
Read more
  • 0
  • 0
  • 3046
article-image-storage-practices-and-migration-hyper-v-2016
Packt
01 Dec 2016
17 min read
Save for later

Storage Practices and Migration to Hyper-V 2016

Packt
01 Dec 2016
17 min read
In this article by Romain Serre, the author of Hyper-V 2016 Best Practices, we will learn about Why Hyper-V projects fail, Overview of the failover cluster, Storage Replica, Microsoft System Center, Migrating VMware virtual machines, and Upgrading single Hyper-v hosts. (For more resources related to this topic, see here.) Why Hyper-V projects fail Before you start deploying your first production Hyper-V host, make sure that you have completed a detailed planning phase. I have been called in to many Hyper-V projects to assist in repairing what a specialist has implemented. Most of the time, I start by correcting the design because the biggest failures happen there, but are only discovered later during implementation. I remember many projects in which I was called in to assist with installations and configurations during the implementation phases, because these were the project phases where a real expert was needed. However, based on experience, this notion is wrong. Most critical to a successful design phase are two features—its rare existence and someone with technological and organizational experience with Hyper-V. If you don't have the latter, look out for a Microsoft Partner with a Gold Competency called Management and Virtualization on Microsoft Pinpoint (http://pinpoint.microsoft.com) and take a quick look at the reviews done by customers for successful Hyper-V projects. If you think it's expensive to hire a professional, wait until you hire an amateur. Having an expert in the design phase is the best way to accelerate your Hyper-V project. Overview of the failover cluster Before you start your first deployment in production, make sure you have defined the aim of the project and its smart criteria and have done a thorough analysis of the current state. After this, you should be able to plan the necessary steps to reach the target state, including a pilot phase. instantly detected. The virtual machines running on that particular node are powered off immediately because of the hardware failure on their computing node. The remaining cluster nodes then immediately take over these VMs in an unplanned failover process and start them on their respective own hardware. The virtual machines will be the backup running after a successful boot of their operating systems and applications in just a few minutes. Hyper-V Failover Clusters work under the condition that all compute nodes have access to a shared storage instance, holding the virtual machine configuration data and its virtual hard disks. In case of a planned failover, that is, for patching compute nodes, it's possible to move running virtual machines from one cluster node to another without interrupting the VM. All cluster nodes can run virtual machines at the same time, as long as there is enough failover capacity running all services when a node goes down. Even though a Hyper-V cluster is still called a Failover Cluster, utilizing the Windows Server Failover Clustering feature, it is indeed capable of running an Active/Active Cluster. To ensure that all these capabilities of a Failover Cluster are indeed working, it demands an accurate planning and implementation process. Storage Replica Storage Replica is a new feature in Windows Server 2016 that provides block replication from storage level for a data recovery plan or for a stretched cluster. Storage Replica can be used in the following scenarios: Server-to-server storage replication using Storage Replica Storage replication in a stretch cluster using Storage Replica Cluster-to-cluster storage replication using Storage Replica Server-to-itself to replicate between volumes using Storage Replica Regarding the scenario or the bandwidth and the latency of the inter-site link, you can choose between a synchronous and an asynchronous replication. For further information about Storage Replica, you can about read this topic at http://bit.ly/2albebS. The SMB3 protocol is used to make Storage Replica. You can leverage TCP/IP or RDMA on the network. I recommend you to implement RDMA when it is possible to reduce latency and CPU workload and to increase throughput. Compared to Hyper-V Replica, the Storage Replica feature provides a replication of all Virtual Machines stored in a volume from the block level. Moreover, Storage Replica can replicate in the synchronous mode while Hyper-V Replica is always in the asynchronous mode. To finish, with Hyper-V Replica you have to specify the failover IP Address because the replication is executed from the VM level, whereas with Storage Replica, you don't need to specify the failover IP Address, but in case of a replication between two clusters in two different rooms, the VM networks must be configured in the destination room. The use of Hyper-V Replica or Storage Replica depends on the disaster recovery plan you need. If you want to protect some application workloads, you can use Hyper-V Replica. On the other hand, if you have the passive room ready to restart in case of issues in the active room, Storage Replica can be a great solution because all the VMs in a volume will be already replicated. To deploy a replication between two clusters you need two sets of storage based on iSCSI, SAS JBOD, fibre channel SAN, or Shared VHDX. For better performance I recommend you to implement SSD that will be used for the logs of Storage Replica. Microsoft System Center Microsoft System Center 2016 is Microsoft's solution for advanced management of Windows Server and its components, along with its dependencies such as various hardware and software products. It consists of various components that support every stage of your IT services from planning to operating over backup to automation. System Center has existed since 1994 and has evolved continuously. It now offers a great set of tools for very efficient management of server and client infrastructures. It also offers the ability to create and operate whole clouds—run in your own data center or a public cloud data center such as Microsoft Azure. Today, it's your choice whether to run your workloads on-premises or off-premises. System Center provides a standardized set of tools for a unique and consistent Cloud OS management experience. System Center does not add any new features to Hyper-V, but it does offer great ways to make the most out of it and ensure streamlined operating processes after its implementation. System Center is licensed via the same model as Windows Server is, leveraging Standard and data center Editions on a physical host level. While every System Center component offers great value in itself, the binding of multiple components into a single workflow offers even more advantages, as shown in the following screenshot: System Center overview When do you need System Center? There is no right or wrong answer to this, and the most given answer by any IT consultant around the world is, "It depends". System Center adds value to any IT environment starting with only a few systems. In my experience, a Hyper-V environment with up to three hosts, and 15 VMs can be managed efficiently without the use of System Center. If you plan to use more hosts or virtual machines, System Center will definitely be a great solution for you. Let's take a look at the components of System Center. Migrating VMware virtual machines If you are running virtual machines on VMware ESXi hosts, there are really good options available for moving them to Hyper-V. There are different approaches on how to convert a VMware virtual machine to Hyper-V: from the inside of the VM on a guest level, running cold conversions with the VM powered off on the host level, running hot conversions on a running VM, and so on. I will give you a short overview of the currently available tools in the market. System Center VMM SCVMM should not be the first tool of your choice, take a look at MVMC combined with MAT to get an equal functionality from a better working tool. The earlier versions of SCVMM allowed online or offline conversions of VMs; the current version, 2016, allows only offline conversions of VMs. Select a powered-off VM on a VMware host or from the SCVMM library share to start the conversion. The VM conversion will convert VMware-hosted virtual machines through vCenter and ensure that the entire configuration, such as memory, virtual processor, and other machine configurations, is also migrated from the initial source. The tool also adds virtual NICs to the deployed virtual machine on Hyper-V. The VMware tools must be uninstalled before the conversion because you won't be able to remove the VMware tools when the VM is not running on a VMware host. SCVMM 2016 supports ESXi hosts running 4.1 and 5.1 but not the latest ESX Version 5.5. SCVMM conversions are great to automate through their integrated PowerShell support and it's very easy to install upgraded Hyper-V integration services as part of the setup or by adding any kind of automation through PowerShell or System Center Orchestrator. Besides manually removing VMware tools, using SCVMM is an end-to-end solution in the migration process. You can find some PowerShell examples for SCVMM-powered V2V conversion scripts at http://bit.ly/Y4bGp8. I don't recommend the use of this tool anymore because Microsoft doesn't spend time on this tool anymore. Microsoft Virtual Machine Converter Microsoft released its first version of the free solution accelerator Microsoft Virtual Machine Converter (MVMC) in 2013, and it should be available in Version 3.1 by the release of this book. MVMC provides a small and easy option to migrate selected virtual machines to Hyper-V. It takes a very similar approach to the conversion as SCVMM does. The conversion happens at a host level and offers a fully integrated end-to-end solution. MVMC supports all recent versions of VMware vSphere. It will even uninstall the VMware tools and install the Hyper-V integration services. MVMC 2.0 works with all supported Hyper-V guest operating systems, including Linux. MVMC comes with a full GUI wizard as well as a fully scriptable command-line interface (CLI). Besides being a free tool, it is fully supported by Microsoft in case you experience any problems during the migration process. MVMC should be the first tool of your choice if you do not know which tool to use. Like most other conversion tools, it does the actual conversion on the MVMC server itself and requires its disk space to host the original VMware virtual disk as well as the converted Hyper-V disk. MVMC even offers an add-on for VMware virtual center servers to start conversions directly from the vSphere console. The current release of MV is freely available at its official download site at http://bit.ly/1HbRIg7. Download MVMC to the conversion system and start the click-through setup. After finishing the download, start the MVMC with the GUI by executing Mvmc.Gui.exe. The wizard guides you through some choices. MVMC is not only capable of migrating to Hyper-V but also allows you to move virtual machines to Microsoft Azure. Follow these few steps to convert a VMware VM: Select Hyper-V as a target. Enter the name of the Hyper-V host you want this VM to run on and specify a fileshare to use and the format of the disks you want to create. Choosing the dynamically expanding disks should be the best option most of the time. Enter the name of the ESXi server you want to use as a source as well as valid credentials. Select the virtual machine to convert. Make sure it has VMware tools installed. The VM can be either powered on or off. Enter a workspace folder to store the converted disk. Wait for the process to finish. There is some additional guidance available at http://bit.ly/1vBqj0U. This is a great and easy way to migrate a single virtual machine. Repeat the steps for every other virtual machine you have, or use some automation. Upgrading single Hyper-V hosts If you are currently running a single host with an older version of Hyper-V and now want to upgrade this host on the same hardware, there is a limited set of decisions to be made. You want to upgrade the host with the least amount of downtime and without losing any data from your virtual machine. Before you start the upgrade process, make sure all components from your infrastructure are compatible with the new version of Hyper-V. Then it's time to prepare your hardware for this new version of Hyper-V by upgrading all firmware to the latest available version and downloading the necessary drivers for Windows Server 2016 with Hyper-V along with its installation media. One of the most crucial questions in this update scenario is whether you should use the integrated installation option called in-place upgrade, where the existing operating system will be transformed to the recent version of Hyper-V or delete the current operating system and perform a clean installation. While the installation experience of in-place upgrades works well when only the Hyper-V role is installed, based on experiences, some versions of upgraded systems are more likely to suffer problems. Numbers pulled from the Elanity support database show about 15 percent more support cases on upgraded systems from Windows Server 2008 R2 than clean installations. Remember how fast and easy it is nowadays to do a clean install of Hyper-V; this is why it is highly recommended to do this over upgrading existing installations. If you are currently using Windows Server 2012 R2 and want to upgrade to Windows Server 2016, note that we have not yet seen any differences in the number of support cases between the installation methods. However, clean installations of Hyper-V being so fast and easy, I barely use them. Before starting any type of upgrade scenario, make sure you have current backups of all affected virtual machines. Nonetheless, if you want to use the in-place upgrade, insert the Windows Server 2016 installation media and run this command from your current operating system: Setup.exe /auto:upgrade If it fails, it's most likely due to an incompatible application installed on the older operating system. Start the setup without the parameter to find out which applications need to be removed before executing the unattended setup. If you upgrade from Windows Server 2012 R2, there is no additional preparation needed; if you upgrade from older operating systems, make sure to remove all snapshots from your virtual machines. Importing virtual machines If you choose to do a clean installation of the operating systems, you do not necessarily have to export the virtual machines first; just make sure all VMs are powered off and are stored on a different partition than your Hyper-V host OS. If you are using a SAN, disconnect all LUNs before the installation and reconnect them afterwards to ensure their integrity through the installation process. After the installation process, just reconnect the LUNs and set the disk online in diskpart or in Disk Management at Control Panel | Computer Management. If you are using local disks, make sure not to reformat the partition with your virtual machines on it. You should export VM to another location and import them back after reformatting; more efforts are required but it is safer. Set the partition online and then reimport the virtual machines. Before you start the reimport process, make sure all dependencies of your virtual machines are available, especially vSwitches. To import a single Hyper-V VM, use the following PowerShell cmdlet: Import-VM -Path 'D:VMsVM01Virtual Machines2D5EECDA-8ECC-4FFC- ACEE-66DAB72C8754.xml' To import all virtual machines from a specific folder, use this command: Get-ChildItem d:VMs -Recurse -Filter "Virtual Machines" | %{Get- ChildItem $_.FullName -Filter *.xml} | %{import-vm $_.FullName - Register} After that, all VMs are registered and ready for use on your new Hyper-V hosts. Make sure to update the Hyper-V integration services of all virtual machines before going back into production. If you still have virtual disks in the old .vhd format, it's now time to convert them to .vhdx files. Use this PowerShell cmdlet on powered-off VMs or standalone vDisk to convert a single .vhd file: Convert-VHD –Path d:VMstestvhd.vhd –DestinationPath d:VMstestvhdx.vhdx If you want to convert the disks of all your VMs, fellow MVPs, Aidan Finn and Didier van Hoye, provided a great end-to-end solution to achieve this. This can be found at http://bit.ly/1omOagi. I often hear from customers that they don't want to upgrade their disks, so as to be able to revert to older versions of Hyper-V when needed. First, you should know that I have never met a customer who has done that because there really is no technical reason why anyone should do this. Second, even if you would do this backwards move, running virtual machines on older Hyper-V hosts is not supported, if they had been deployed on more modern versions of Hyper-V before. The reason for this is very simple; Hyper-V does not offer a way for downgrading Hyper-V integration services. The only way to move a virtual machine back to an older Hyper-V host is by restoring a backup of the VM made earlier before the upgrade process. Exporting virtual machines If you want to use another physical system running a newer version of Hyper-V, you have multiple possible options. They are as follows: When using a SAN as a shared storage, make sure all your virtual machines, including their virtual disks, are located on other LUNs rather than on the host operating system. Disconnect all LUNs hosting virtual machines from the source host and connect them to the target host. Bulk import the VMs from the specified folders. When using SMB3 shared storage from scale-out file servers, make sure to switch access to the shared hosting VMs to the new Hyper-V hosts. When using local hard drives and upgrading from Windows Server 2008 SP2 or Windows Server 2008 R2 with Hyper-V, it's necessary to export the virtual machines to a storage location reachable from the new host. Hyper-V servers running legacy versions of the OS (prior to 2012 R2) need to power off the VMs before an export can occur. To export a virtual machine from a host, use the following PowerShell cmdlet: Export-VM –Name VM –Path D: To export all virtual machines to a folder underneath the following root, use the following command: Get-VM | Export-VM –Path D: In most cases, it is also possible to just copy the virtual machine folders containing virtual hard disk's and configuration files to the target location and import them to Windows Server 2016 Hyper-V hosts. However, the export method is more reliable and should be preferred. A good alternative for moving virtual machines can be the recreation of virtual machines. If you have another host up and running with a recent version of Hyper-V, it may be a good opportunity to also upgrade some guest OSes. For instance, Windows Server 2003 and 2003 R2 are out of extended support since July 2015. Depending on your applications, it may now be the right choice to create new virtual machines with Windows Server 2016 as a guest operating system and migrate your existing workloads from older VMs to these new machines. Summary In this article, we learned about Why Hyper-V projects fail, how to migrate VMware virtual machine and also about upgrading single Hyper-V hosts. This article also covers the overview of the failover cluster and Storage Replica. Resources for Article: Further resources on this subject: Hyper-V Basics [article] The importance of Hyper-V Security [article] Hyper-V building blocks for creating your Microsoft virtualization platform [article]
Read more
  • 0
  • 0
  • 3948

article-image-configuration-guide
Packt
08 Nov 2016
13 min read
Save for later

A Configuration Guide

Packt
08 Nov 2016
13 min read
In this article by Dimitri Aivaliotis, author Mastering NGINX - Second Edition, The NGINX configuration file follows a very logical format. Learning this format and how to use each section is one of the building blocks that will help you to create a configuration file by hand. Constructing a configuration involves specifying global parameters as well as directives for each individual section. These directives and how they fit into the overall configuration file is the main subject of this article. The goal is to understand how to create the right configuration file to meet your needs. (For more resources related to this topic, see here.) The basic configuration format The basic NGINX configuration file is set up in a number of sections. Each section is delineated as shown: <section> { <directive> <parameters>; } It is important to note that each directive line ends with a semicolon (;). This marks the end of line. The curly braces ({}) actually denote a new configuration context, but we will read these as sections for the most part. The NGINX global configuration parameters The global section is used to configure the parameters that affect the entire server and is an exception to the format shown in the preceding section. The global section may include configuration directives, such as user and worker_processes, as well as sections, such as events. There are no open and closing braces ({}) surrounding the global section. The most important configuration directives in the global context are shown in the following table. These configuration directives will be the ones that you will be dealing with for the most part. Global configuration directives Explanation user The user and group under which the worker processes run is configured using this parameter. If the group is omitted, a group name equal to that of the user is used. worker_processes This directive shows the number of worker processes that will be started. These processes will handle all the connections made by the clients. Choosing the right number depends on the server environment, the disk subsystem, and the network infrastructure. A good rule of thumb is to set this equal to the number of processor cores for CPU-bound loads and to multiply this number by 1.5 to 2 for the I/O bound loads. error_log This directive is where all the errors are written. If no other error_log is given in a separate context, this log file will be used for all errors, globally. A second parameter to this directive indicates the level at which (debug, info, notice, warn, error, crit, alert, and emerg) errors are written in the log. Note that the debug-level errors are only available if the --with-debug configuration switch is given at compilation time. pid This directive is the file where the process ID of the main process is written, overwriting the compiled-in default. use This directive indicates the connection processing method that should be used. This will overwrite the compiled-in default and must be contained in an events context, if used. It will not normally need to be overridden, except when the compiled-in default is found to produce errors over time. worker_connections This directive configures the maximum number of simultaneous connections that a worker process may have opened. This includes, but is not limited to, client connections and connections to upstream servers. This is especially important on reverse proxy servers—some additional tuning may be required at the operating system level in order to reach this number of simultaneous connections. Here is a small example using each of these directives: # we want nginx to run as user 'www' user www; # the load is CPU-bound and we have 12 cores worker_processes 12; # explicitly specifying the path to the mandatory error log error_log /var/log/nginx/error.log; # also explicitly specifying the path to the pid file pid /var/run/nginx.pid; # sets up a new configuration context for the 'events' module events { # we're on a Solaris-based system and have determined that nginx # will stop responding to new requests over time with the default # connection-processing mechanism, so we switch to the second-best use /dev/poll; # the product of this number and the number of worker_processes # indicates how many simultaneous connections per IP:port pair are # accepted worker_connections 2048; } This section will be placed at the top of the nginx.conf configuration file. Using the include files The include files can be used anywhere in your configuration file, to help it be more readable and to enable you to reuse parts of your configuration. To use them, make sure that the files themselves contain the syntactically correct NGINX configuration directives and blocks; then specify a path to those files: include /opt/local/etc/nginx/mime.types; A wildcard may appear in the path to match multiple files: include /opt/local/etc/nginx/vhost/*.conf; If the full path is not given, NGINX will search relative to its main configuration file. A configuration file can easily be tested by calling NGINX as follows: nginx -t -c <path-to-nginx.conf> This command will test the configuration, including all files separated out into the include files, for syntax errors. Sample configuration The following code is an example of an HTTP configuration section: http { include /opt/local/etc/nginx/mime.types; default_type application/octet-stream; sendfile on; tcp_nopush on; tcp_nodelay on; keepalive_timeout 65; server_names_hash_max_size 1024; } This context block would go after any global configuration directives in the nginx.conf file. The virtual server section Any context beginning with the keyword server is considered as a virtual server section. It describes a logical separation of a set of resources that will be delivered under a different server_name directive. These virtual servers respond to the HTTP requests, and are contained within the http section. A virtual server is defined by a combination of the listen and server_name directives. The listen directive defines an IP address/port combination or path to a UNIX-domain socket: listen address[:port]; listen port; listen unix:path; The listen directive uniquely identifies a socket binding under NGINX. There are a number of optional parameters that listen can take: The listen parameters Explanation Comments default_server This parameter defines this address/port combination as being the default value for the requests bound here.   setfib This parameter sets the corresponding FIB for the listening socket. This parameter is only supported on FreeBSD and not for UNIX-domain sockets. backlog This parameter sets the backlog parameter in the listen() call. This parameter defaults to -1 on FreeBSD and 511 on all other platforms. rcvbuf This parameter sets the SO_RCVBUF parameter on the listening socket.   sndbuf This parameter sets the SO_SNDBUF parameter on the listening socket.   accept_filter This parameter sets the name of the accept filter to either dataready or httpready. This parameter is only supported on FreeBSD. deferred This parameter sets the TCP_DEFER_ACCEPT option to use a deferred accept() call. This parameter is only supported on Linux. bind This parameter makes a separate bind() call for this address/port pair. A separate bind() call will be made implicitly if any of the other socket-specific parameters are used. ipv6only This parameter sets the value of the IPV6_ONLY parameter. This parameter can only be set on a fresh start and not for UNIX-domain sockets. ssl This parameter indicates that only the HTTPS connections will be made on this port. This parameter allows for a more compact configuration. so_keepalive This parameter configures the TCP keepalive connection for the listening socket.   The server_name directive is fairly straightforward and it can be used to solve a number of configuration problems. Its default value is "", which means that a server section without a server_name directive will match a request that has no Host header field set. This can be used, for example, to drop requests that lack this header: server { listen 80; return 444; } The nonstandard HTTP code, 444, used in this example will cause NGINX to immediately close the connection. Besides a normal string, NGINX will accept a wildcard as a parameter to the server_name directive: The wildcard can replace the subdomain part: *.example.com The wildcard can replace the top-level domain part: www.example.* A special form will match the subdomain or the domain itself: .example.com (matches *.example.com as well as example.com) A regular expression can also be used as a parameter to server_name by prepending the name with a tilde (~): server_name ~^www.example.com$; server_name ~^www(d+).example.(com)$; The latter form is an example using captures, which can later be referenced (as $1, $2, and so on) in further configuration directives. NGINX uses the following logic when determining which virtual server should serve a specific request: Match the IP address and port to the listen directive. Match the Host header field against the server_name directive as a string. Match the Host header field against the server_name directive with a wildcard at the beginning of the string. Match the Host header field against the server_name directive with a wildcard at the end of the string. Match the Host header field against the server_name directive as a regular expression. If all the Host headers match fail, direct to the listen directive marked as default_server. If all the Host headers match fail and there is no default_server, direct to the first server with a listen directive that satisfies step 1. This logic is expressed in the following flowchart: The default_server parameter can be used to handle requests that would otherwise go unhandled. It is therefore recommended to always set default_server explicitly so that these unhandled requests will be handled in a defined manner. Besides this usage, default_server may also be helpful in configuring a number of virtual servers with the same listen directive. Any directives set here will be the same for all matching server blocks. Locations – where, when, and how The location directive may be used within a virtual server section and indicates a URI that comes either from the client or from an internal redirect. Locations may be nested with a few exceptions. They are used for processing requests with as specific configuration as possible. A location is defined as follows: location [modifier] uri {...} Or it can be defined for a named location: location @name {…} A named location is only reachable from an internal redirect. It preserves the URI as it was before entering the location block. It may only be defined at the server context level. The modifiers affect the processing of a location in the following way: Location modifiers Handling = This modifier uses exact match and terminate search. ~ This modifier uses case-sensitive regular expression matching. ~* This modifier uses case-insensitive regular expression matching. ^~ This modifier stops processing before regular expressions are checked for a match of this location's string, if it's the most specific match. Note that this is not a regular expression match—its purpose is to preempt regular expression matching. When a request comes in, the URI is checked against the most specific location as follows: Locations without a regular expression are searched for the most-specific match, independent of the order in which they are defined. Regular expressions are matched in the order in which they are found in the configuration file. The regular expression search is terminated on the first match. The most-specific location match is then used for request processing. The comparison match described here is against decoded URIs; for example, a %20 instance in a URI will match against a "" (space) specified in a location. A named location may only be used by internally redirected requests. The following directives are found only within a location: Location-only directives Explanation alias This directive defines another name for the location, as found on the filesystem. If the location is specified with a regular expression, alias should reference captures defined in that regular expression. The alias directive replaces the part of the URI matched by the location such that the rest of the URI not matched will be searched for in that filesystem location. Using the alias directive is fragile when moving bits of the configuration around, so using the root directive is preferred, unless the URI needs to be modified in order to find the file. internal This directive specifies a location that can only be used for internal requests (redirects defined in other directives, rewrite requests, error pages, and so on.) limit_except This directive limits a location to the specified HTTP verb(s) (GET also includes HEAD). Additionally, a number of directives found in the http section may also be specified in a location. Refer to Appendix A, Directive Reference, for a complete list. The try_files directive deserves special mention here. It may also be used in a server context, but will most often be found in a location. The try_files directive will do just that—try files in the order given as parameters; the first match wins. It is often used to match potential files from a variable and then pass processing to a named location, as shown in the following example: location / { try_files $uri $uri/ @mongrel; } location @mongrel { proxy_pass http://appserver; } Here, an implicit directory index is tried if the given URI is not found as a file and then processing is passed on to appserver via a proxy. We will explore how best to use location, try_files, and proxy_pass to solve specific problems throughout the rest of the article. Locations may be nested except in the following situations: When the prefix is = When the location is a named location Best practice dictates that regular expression locations be nested inside the string-based locations. An example of this is as follows: # first, we enter through the root location / { # then we find a most-specific substring # note that this is not a regular expression location ^~ /css { # here is the regular expression that then gets matched location ~* /css/.*.css$ { } } } Summary In this article, we saw how the NGINX configuration file is built. Its modular nature is a reflection, in part, of the modularity of NGINX itself. A global configuration block is responsible for all aspects that affect the running of NGINX as a whole. There is a separate configuration section for each protocol that NGINX is responsible for handling. We may further define how each request is to be handled by specifying servers within those protocol configuration contexts (either http or mail) so that requests are routed to a specific IP address/port. Within the http context, locations are then used to match the URI of the request. These locations may be nested, or otherwise ordered to ensure that requests get routed to the right areas of the filesystem or application server. Resources for Article: Further resources on this subject: Zabbix Configuration [article] Configuring the essential networking services provided by pfSense [article] Configuring a MySQL linked server on SQL Server 2008 [article]
Read more
  • 0
  • 0
  • 2331