Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon

Tech Guides

852 Articles
article-image-tools-to-stay-completely-anonymous-online
Guest Contributor
12 Jul 2018
8 min read
Save for later

10 great tools to stay completely anonymous online

Guest Contributor
12 Jul 2018
8 min read
Everybody is facing a battle these days. Though it may not be immediately apparent, it is already affecting a majority of the global population. This battle is not fought with bombs, planes, or tanks or with any physical weapons for that matter. This battle is for our online privacy. A survey made last year discovered 69% of data breaches were related to identity theft. Another survey shows the number of cases of data breaches related to identity theft has steadily risen over the last 4 years worldwide. And it is likely to increase as hackers are gaining easy access more advanced tools. The EU’s GDPR may curb this trend by imposing stricter data protection standards on data controllers and processors. These entities have been collecting and storing our data for years through ads that track our online habits-- another reason to protect our online anonymity. However, this new regulation has only been in force for over a month and only within the EU. So, it's going to take some time before we feel its long-term effects. The question is, what should we do when hackers out there try to steal and maliciously use our personal information? Simple: We defend ourselves with tools at our disposal to keep ourselves completely anonymous online. So, here’s a list you may find useful. 1. VPNs A VPN helps you maintain anonymity by hiding your real IP and internet activity from prying eyes. Normally, your browser sends a query tagged with your IP every time you make an online search. Your ISP takes this query and sends it to a DNS server which then points you to the correct website. Of course, your ISP (and all the servers your query had to go through) can, and will likely, view and monitor all the data you course through them-- including your personal information and IP address. This allows them to keep a tab on all your internet activity. A VPN protects your identity by assigning you an anonymous IP and encrypting your data. This means that any query you send to your ISP will be encrypted and no longer display your real IP. This is why using a VPN is one of the best ways to keeping anonymous online. However, not all VPNs are created equal. You have to choose the best one if you want airtight security. Also, beware of free VPNs. Most of them make money by selling your data to advertisers. You’ll want to compare and contrast several VPNs to find the best one for you. But, that’s sooner said than done with so many different VPNs out there. Look for reviews on trustworthy sites to find the best vpn for your needs. 2. TOR Browser The Onion Router (TOR) is a browser that strengthens your online anonymity even more by using different layers of encryption-- thereby protecting your internet activity which includes “visits to Web sites, online posts, instant messages, and other communication forms”. It works by first encasing your data in three layers of encryption. Your data is then bounced three times-- each bounce taking off one layer of encryption. Once your data gets to the right server, it “puts back on” each layer it has shed as it successively bounces back to your device. You can even improve TOR by using it in combination with a compatible VPN. It is important to note, though, that using TOR won’t hide the fact that you’re using it. Some sites may restrict allowances made through TOR. 3. Virtual machine A Virtual machine is basically a second computer within your computer. It lets you emulate another device through an application. This emulated computer can then be set according to your preferences. The best use for this tool, however, is for tasks that don’t involve an internet connection. It is best used for when you want to open a file and want to make sure no one is watching over your shoulder. After opening the file, you then simply delete the virtual machine. You can try VirtualBox which is available on Windows, Linux, and Mac. 4. Proxy servers A proxy server is an intermediary between your device and the internet. It’s basically another computer that you use to process internet requests. It’s similar to a virtual machine in concept but it’s an entirely separate physical machine. It protects your anonymity in a similar way a VPN does (by hiding your IP) but it can also send a different user agent to keep your browser unidentifiable and block or accept cookies but keep them from passing to your device. Most VPN companies also offer proxy servers so they’re a good place to look for a reliable one. 5. Fake emails A fake email is exactly what the name suggests: an email that isn’t linked to your real identity. Fake emails aid your online anonymity by not only hiding your real identity but by making sure to keep you safe from phishing emails or malware-- which can be easily sent to you via email. Making a fake email can be as easy as signing up for an email without using your real information or by using a fake email service. 6. Incognito mode “Going incognito” is the easiest anonymity tool to come by. Your device will not store any data at all while in this mode including: your browsing history, cookies, site data, and information entered in forms. Most browsers have a privacy mode that you can easily use to hide your online activity from other users of the same device. 7. Ad blockers Ads are everywhere these days. Advertising has and always will be a lucrative business. That said, there is a difference between good ads and bad ads. Good ads are those that target a population as a whole. Bad ads (interest-based advertising, as their companies like to call it) target each of us individually by tracking our online activity and location-- which compromises our online privacy. Tracking algorithms aren’t illegal, though, and have even been considered “clever”. But, the worst ads are those that contain malware that can infect your device and prevent you from using it. You can use ad blockers to combat these threats to your anonymity and security. Ad blockers usually come in the form of browser extensions which instantly work with no additional configuration needed. For Google Chrome, you can choose either Adblock Plus, uBlock Origin, or AdBlock. For Opera, you can choose either Opera Ad Blocker, Adblock Plus, or uBlock Origin. 8. Secure messaging apps If you need to use an online messaging app, you should know that the popular ones aren’t as secure as you’d like them to be. True, Facebook messenger does have a “secret conversation” feature but Facebook hasn’t exactly been the most secure social network to begin with. Instead, use tools like Signal or Telegram. These apps use end-to-end encryption and can even be used to make voice calls. 9. File shredder The right to be forgotten has surfaced in mainstream media with the onset of the EU’s General Data Protection Regulation. This right basically requires data collecting or processing entities to completely remove a data subject’s PII from their records. You can practice this same right on your own device by using a “file shredding” tool. But the the thing is: Completely removing sensitive files from your device is hard. Simply deleting it and emptying your device’s recycle bin doesn’t actually remove the file-- your device just treats the space it filled up as empty and available space. These “dead” files can still haunt you when they are found by someone who knows where to look. You can use software like Dr. Cleaner (for Mac) or Eraser (for Win) to “shred” your sensitive files by overwriting them several times with random patterns of random sets of data. 10. DuckDuckGo DuckDuckGo is a search engine that doesn’t track your behaviour (like Google and Bing that use behavioural trackers to target you with ads). It emphasizes your privacy and avoids the filter bubble of personalized search results. It offers useful features like region-specific searching, Safe Search (to protect against explicit content), and an instant answer feature which shows an answer across the top of the screen apart from the search results. To sum it up: Our online privacy is being attacked from all sides. Ads legally track our online activities and hackers steal our personal information. The GDPR may help in the long run but that remains to be seen. What's important is what we do now. These tools will set you on the path to a more secure and private internet experience today. About the Author Dana Jackson, an U.S. expat living in Germany and the founder of PrivacyHub. She loves all things related to security and privacy. She holds a degree in Political Science, and loves to call herself a scientist. Dana also loves morning coffee and her dog Paw.   [divider style="normal" top="20" bottom="20"] Top 5 cybersecurity trends you should be aware of in 2018 Twitter allegedly deleted 70 million fake accounts in an attempt to curb fake news Top 5 cybersecurity myths debunked  
Read more
  • 0
  • 4
  • 25211

article-image-cloud-pricing-comparison-aws-vs-azure
Guest Contributor
02 Feb 2019
11 min read
Save for later

Cloud pricing comparison: AWS vs Azure

Guest Contributor
02 Feb 2019
11 min read
On average, businesses waste about 35% of their cloud spend due to inefficiently using their cloud resources. This amounts to more than $10 billion in wasted cloud spend across just the top three public cloud providers. Although the unmatched compute power, data storage options and efficient content delivery systems of the leading public cloud providers can support incredible business growth, this can cause some hubris. It’s easy to lose control of costs when your cloud provider appears to be keeping things running smoothly. To stop this from happening, it’s essential to adopt a new approach to how we manage - and optimize - cloud spend. It’s not an easy thing to do, as pricing structures can be complicated. However, in this post, we’ll look at how both AWS and Azure structure their pricing, and how you can best determine what’s right for you. Different types of cloud pricing schemes Broadly, the pricing model for cloud services can range from a pure subscription-based model, where services are charged based on a cloud catalog and users are billed per month, per mailbox, or app license ordered. In this instance, subscribers are billed for all the resources to which they are subscribed, irrespective of whether they are used or not. The other option is pay-as-you-go. This is where subscribers begin with a billing amount set at 0, which then grows with the services and resources they use.. Amazon uses the Pay-As-You-Go model, charging a predetermined price for every hour of virtual machine resources used. Such a model is also used by other leading cloud service providers including Microsoft Azure and Google’s Google Cloud Platform. Another variant of cloud pricing is an enterprise billing service. This is based on the number of active users assigned to a particular cloud subscription. Microsoft Azure is a leading cloud provider that offers cloud subscription for its customers. Most cloud providers offer varying combinations of the above three models with attractive discount options built-in. These include: What free tier services do AWS and Azure offer? Both AWS and Azure offer a ‘free tier’ service for new and initial subscribers. This is for potential long-time subscribers to test out the service before committing for the long run. For AWS, Amazon allows subscribers to try out most of AWS’ services free for a year, including RDS, S3, EC2, Elastic Block Store, Elastic Load Balancing (EBS) and other AWS services. For example, you can utilize EC2 and EBS on the free tier to host a website for a whole year. EBS pricing will be zero unless your usage exceeds the limit of 30GB of storage. The free tier for the EC2 includes 730 hours of a t2.micro instance. Azure offers similar deals for new users. Azure’s services like App Service, Virtual Machines, Azure SQL Database, Blob Storage and Azure Kubernetes Service (AKS) are free for the initial period of 12 months. Additionally, Azure provides the ‘Functions’ compute service (for serverless) at 1 million requests free every month throughout the subscription. This is useful if you want to give serverless a try. AWS and Azure’s pay-as-you-go, on-demand pricing models Under the pay-as-you-go model, AWS and Azure offer subscribers the option to simply settle their bills at the end of every month without any upfront investment. This is a good option if you want to avoid a long-term and binding contract. Most resources are available on demand and charged on a per hour basis, and costs are calculated based on the number of hours the resource was used. For data storage and data transfer, the rates are generally calculated per Gigabyte. Subscribers are notified 30 days in advance for any changes in the Pay As You Go rates as well as when new services are added periodically to the platform. Reserve-and-pay-less pricing model In addition to the on-demand pricing model, Amazon AWS has an alternate scheme called Reserved Instance (RI) that allows the subscriber to reserve capacity for specific products. RI offers discounted hourly rates and capacity reservation for its EC2 and RDS services. A subscriber can reserve a resource and can save up to 75% of total billing costs in the long run. These discounted rates are automatically added to the subscriber’s AWS bills. Subscribers have the option to reserve instances either for a 1-year or a 3-year term. Microsoft Azure offers to help subscribers save up to 72% of their billing costs compared to its pay-as-you-go model when subscribers sign up for one to three-year terms for Windows and Linux virtual machines (VMs). Microsoft also allows for added flexibility in the sense that if your business needs change, you can cancel your Azure RI subscription at any time and return the remaining unused RI to Microsoft for an early termination fee. Use-more-and-pay-less pricing model In addition to the above payment options, AWS offers subscribers one additional payment option. When it comes to data transfer and data storage services, AWS gives discounts based on the subscriber’s usage. These volume-based discounts help subscribers realize critical savings as their usage increases. Subscribers can benefit from the economies of scale, allowing their businesses to grow while costs are kept relatively under control. AWS also gives subscribers the option to sign up for services that help their growing business. As an example, AWS’ storage services offer subscribers with opportunities to lower pricing based on how frequently data is accessed and performance needed in the retrieval process. For EC2, you can get a discount of up to 10% if you reserve more. The image below demonstrates the pricing of the AWS S3 bucket based on usage. Comparing Cloud Pricing on Azure and AWS As the major cloud service providers – Amazon Web Services, Azure, Google Cloud Platform and IBM – continually decrease prices of cloud instances, provide new and innovative discount options, include additional instances, and drop billing increments. In some cases, especially, Microsoft Azure, per second billing has also been introduced. However, as costs decrease, the complexity increases. It is paramount for subscribers to understand and efficiently navigate this complexity. We take a crack at it here. Reserved Instance Pricing Given the availability of Reserved Instances by Azure, AWS and GCP have also introduced publicly available discounts, some reaching up to 75%. This is in exchange for signing up to use the services of the particular cloud service provider for a one year to 3 year period. We’ve briefly covered this in the section above. Before signing up, however, subscribers need to understand the amount of usage they are committing to and how much of usage to leave as an ‘on-demand’ option. To do this, subscribers need to consider many different factors – Historical usage – by region, instance type, etc Steady-state vs. part-time usage An estimate of usage growth or decline Probability of switching cloud service providers Choosing alternative computing models like serverless, containers, etc. On-Demand Instance Pricing On-Demand Instances work best for applications that have short-term, irregular workloads but critical enough as to not be interrupted. For instance, if you’re running cron jobs on a periodic basis that lasts for a few hours, you can move them to on-demand instances. Each On-Demand Instance is billed per instance hour from time it is launched until it is terminated. These are most useful during the testing or development phase of applications. On-demand instances are available in many varying levels of computing power, designed for different tasks executed within the cloud environment. These on-demand instances have no binding contractual commitments and can be used as and when required. Generally, on-demand instances are among the most expensive purchasing options for instances. Each on-demand instance is billed at a per instance hour from the time it is launched until it is stopped or terminated. If partial instance hours are used, these are rounded up to the full hour during billing. The chart below shows the on-demand price per hour for AWS and Azure cloud services and the hourly price for each GB of RAM. VM Type AWS OD Hourly Azure OD Hourly AWS OD / GB RAM Azure OD / GB RAM Standard 2 vCPU w Local SSD $0.133 $0.100 $0.018 $0.013 Standard 2 vCPU no local disk $0.100 $0.100 $0.013 $0.013 Highmem 2 vCPU w Local SSD $0.166 $0.133 $0.011 $0.008 Highmem 2 vCPU no local disk $0.133 $0.133 $0.009 $0.008 Highcpu 2 vCPU w Local SSD $0.105 $0.085 $0.028 $0.021 Highcpu 2 vCPU no local disk $0.085 $0.085 $0.021 $0.021   The on-demand price of Azure instances is cheaper compared to AWS for certain VM types. The price difference is evident for instances with local SSD. Discounted Cloud Instance Pricing When it comes to discounted cloud pricing, it is important to remember that this comes with a lock-in period of 1 – 3 years. Therefore, it would work best for organizations that are more stable and have a good idea of what their historical cloud usage is and can fairly accurately predict what cloud services they would require over the next 12 month period. In the table below, we have looked at annual costs of both AWS and Azure. VM Type AWS 1 Y RI Annual Azure 1 Y RI Annual AWS 1 Y RI Annual / GB RAM Azure 1 Y RI Annual / GB RAM Standard 2 vCPU w Local SSD $867 $508 $116 $64 Standard 2 vCPU no local disk $622 $508 $78 $64 Highmem 2 vCPU w Local SSD $946 $683 $63 $43 Highmem 2 vCPU no local disk $850 $683 $56 $43 Highcpu 2 vCPU w Local SSD $666 $543 $178 $136 Highcpu 2 vCPU no local disk $543 $543 $136 $136 Azure’s rates are clearly better than Amazon’s pricing and by a good margin. Azure offers better-discounted rates for Standard, Highmem and High CPU compute instances.   Optimizing Cloud Pricing Subscribers need to move beyond short-term, one time fixes and make use of automation to continuously monitor their spend, raise alerts for over or underuse of service and also take an automated action based on a predetermined condition. Here are some of the ways you can optimize your cloud spending: Cloud Pricing Calculators Cloud Pricing tools enable you to list the different parameters for your AWS or Azure subscriptions. You can use these tools to calculate an approximate monthly cost that would likely be incurred. AWS Simple Monthly Calculator You can try the official cloud pricing calculators from AWS and Azure or a third-party pricing calculator. Calculators help you to optimize your pricing based on your requirements. For example, if you have a long-term requirement for running instances, and if you’re currently running them using on-demand pricing schemes, cloud calculators can offer better insights into reserved-instance schemes and other ways that you can improve your cloud expenditure. For instance, this Azure calculator by NetApp offers more price optimization option. This includes options to tier less frequently used data to storage objects like Azure Blob and customize snapshot creation and storage efficiency. Zerto is another popular calculator for Azure and AWS with a simpler interface. However, note that the estimated cost is based on current pricing and is subject can be liable to change. Price List API Historically, for potential users to narrow down on the final usage cost involved a considerable amount of manual rate checks. They involve collecting price points, and checking and cross-referencing them manually. In the case of AWS, the Price List API offers programmatic access, which is especially beneficial to designers who can now query the AWS price list instead of searching manually through the web. To make matters more natural, the queries can be constructed into simple code in any language. Azure offers a similar billing API to gain insights into your Azure usage programmatically. Summary Understanding and optimizing cloud pricing is somewhat challenging with AWS and Azure. This is partially because they offer hundreds of features with different pricing options and new features are added to the pipeline every week. To solve some of these complexities, we’ve covered some of the popular ways to tackle pricing in AWS and Azure. Here’s a list of things that we’ve covered: How the cloud pricing works and the different pricing schemes in AWS and Azure Comparison of different instance pricing options in AWS and Azure which includes reserved instance, on-demand instances, and discounted instances. Third-party tools like calculators for optimizing price. Price list API for AWS and Azure. If you have any thoughts to share, feel free to post it in the comments. About the author Gilad David Maayan Gilad David Maayan is a technology writer who has worked with over 150 technology companies including SAP, Oracle, Zend, CheckPoint and Ixia. Gilad is a 3-time winner of international technical communication awards, including the STC Trans-European Merit Award and the STC Silicon Valley Award of Excellence. Over the past 7 years, Gilad has headed Agile SEO, which performs strategic search marketing for leading technology brands. Together with his team, Gilad has done market research, developer relations, and content strategy in 39 technology markets, lending him a broad perspective on trends, approaches, and ecosystems across the tech industry. Cloud computing trends in 2019 The 10 best cloud and infrastructure conferences happening in 2019 Bo Weaver on Cloud security, skills gap, and software development in 2019  
Read more
  • 0
  • 0
  • 25096

article-image-python-data-stack
Akram Hussain
31 Oct 2014
3 min read
Save for later

Python Data Stack

Akram Hussain
31 Oct 2014
3 min read
The Python programming language has grown significantly in popularity and importance, both as a general programming language and as one of the most advanced providers of data science tools. There are 6 key libraries every Python analyst should be aware of, and they are: 1 - NumPY NumPY: Also known as Numerical Python, NumPY is an open source Python library used for scientific computing. NumPy gives both speed and higher productivity using arrays and metrics. This basically means it's super useful when analyzing basic mathematical data and calculations. This was one of the first libraries to push the boundaries for Python in big data. The benefit of using something like NumPY is that it takes care of all your mathematical problems with useful functions that are cleaner and faster to write than normal Python code. This is all thanks to its similarities with the C language. 2 - SciPY SciPY: Also known as Scientific Python, is built on top of NumPy. SciPy takes scientific computing to another level. It’s an advanced form of NumPy and allows users to carry out functions such as differential equation solvers, special functions, optimizers, and integrations. SciPY can be viewed as a library that saves time and has predefined complex algorithms that are fast and efficient. However, there are a plethora of SciPY tools that might confuse users more than help them. 3 - Pandas Pandas is a key data manipulation and analysis library in Python. Pandas strengths lie in its ability to provide rich data functions that work amazingly well with structured data. There have been a lot of comparisons between pandas and R packages due to their similarities in data analysis, but the general consensus is that it is very easy for anyone using R to migrate to pandas as it supposedly executes the best features of R and Python programming all in one. 4 - Matplotlib Matplotlib is a visualization powerhouse for Python programming, and it offers a large library of customizable tools to help visualize complex datasets. Providing appealing visuals is vital in the fields of research and data analysis. Python’s 2D plotting library is used to produce plots and make them interactive with just a few lines of code. The plotting library additionally offers a range of graphs including histograms, bar charts, error charts, scatter plots, and much more. 5 - scikit-learn scikit-learn is Python’s most comprehensive machine learning library and is built on top of NumPy and SciPy. One of the advantages of scikit-learn is the all in one resource approach it takes, which contains various tools to carry out machine learning tasks, such as supervised and unsupervised learning. 6 - IPython IPython makes life easier for Python developers working with data. It’s a great interactive web notebook that provides an environment for exploration with prewritten Python programs and equations. The ultimate goal behind IPython is improved efficiency thanks to high performance, by allowing scientific computation and data analysis to happen concurrently using multiple third-party libraries. Continue learning Python with a fun (and potentially lucrative!) way to use decision trees. Read on to find out more.
Read more
  • 0
  • 0
  • 25041

article-image-newsql-what-hype-about
Amey Varangaonkar
06 Nov 2017
6 min read
Save for later

NewSQL: What the hype is all about

Amey Varangaonkar
06 Nov 2017
6 min read
First, there was data. Data became database. Then came SQL. Next came NoSQL. And now comes NewSQL. NewSQL Origins For decades, relational database or SQL was the reigning data management standard in enterprises all over the world. With the advent of Big Data and cloud-based storage rose the need for a faster, more flexible and scalable data management system, which didn’t necessarily comply with the SQL standards of ACID compliance. This was popularly dubbed as NoSQL, and databases like MongoDB, Neo4j, and others gained prominence in no time. We can attribute the emergence and eventual adoption of NoSQL databases to a couple of very important factors. The high costs and lack of flexibility of the traditional relational databases drove many SQL users away. Also, NoSQL databases are mostly open source, and their enterprise versions are comparatively cheaper too. They are schema-less meaning they can be used to manage unstructured data effectively. In addition, they can scale well horizontally - i.e. you could add more machines to increase computing power and use it to handle high volumes of data. All these features of NoSQL come with an important tradeoff, however - these systems can’t simultaneously ensure total consistency. Of late, there has been a rise in another type of database systems, with the aim to combine ‘the best of both the worlds’. Popularly dubbed as ‘NewSQL’, this system promises to combine the relational data model of SQL and the scalability and speed of NoSQL. NewSQL - The dark horse in the databases race NewSQL is ‘SQL on Steroids’, say many. This is mainly because all NewSQL systems start with the relational data model and the SQL query language, but also incorporate the features that have led to the rise of NoSQL - addressing the issues of scalability, flexibility, and high performance. They offer the assurance of ACID transactions like in the relational models. However, what makes them really unique is that they allow the horizontal scaling functionality of NoSQL, and can process large volumes of data with high performance and reliability. This is why businesses really like the concept of NewSQL - the performance of NoSQL and the reliability and consistency of the SQL model, all packed in one. To understand what the hype surrounding NewSQL is all about, it’s worth comparing NewSQL database systems with the traditional SQL and NoSQL database systems, and see where they stand out: Characteristic Relational (SQL) NoSQL NewSQL ACID compliance Yes No Yes OLTP/OLAP support Yes No Yes Rigid Schema Structure Yes No In some cases Support for unstructured data No Yes In some cases Performance with large data Moderate Fast Very fast Performance overhead Huge Moderate Minimal Support from Community Very high High Low   As we can see from the table above, NewSQL really comes through as the best when you’re dealing with larger datasets with a desire to lower performance overheads. To give you a practical example, consider an organization that has to work with a large number of short transactions, access a limited amount of data, but executes those queries repeatedly. For such organizations, a NewSQL database system would be a perfect fit. These features are leading to the gradual growth of NewSQL systems. However, it will take some time for more industries to adopt them. Not all NewSQL databases are created equal Today, one has a host of NewSQL solutions to choose from. Some popular solutions are Clustrix, MemSQL, VoltDB and CockroachDB.  Cloud Spanner, the latest NewSQL offering by Google, became generally available in February 2017 - indicating Google’s interest in the NewSQL domain and the value a NewSQL database can offer to their existing cloud offerings. It is important to understand that there are significant differences among these various NewSQL solutions. As such you should choose a NewSQL solution carefully after evaluating your organization’s data requirements and problems. As this article on Dataconomy points out, while some databases handle transactional workloads well, they do not offer the benefit of native clustering - SAP HANA is one such example. NuoDB focuses on cloud deployments, but its overall throughput is found to be rather sub-par. MemSQL is a suitable choice when it comes to clustered analytics but falls short when it comes to consistency. Thus, the choice of the database purely depends on the task you want to do, and what trade-offs you are ready to allow without letting it affect your workflow too much. DBAs and Programmers in the NewSQL world Regardless of which database system an enterprise adopts, the role of DBAs will continue to be important going forward. Core database administration and maintenance tasks such as backup, recovery, replication, etc. will need to be taken care of. The major challenge for the NewSQL DBAs will be in choosing and then customizing the right database solution that fits the organizational requirements. Some degree of capacity planning and overall database administration skills might also have to be recalibrated. Likewise, NewSQL database programmers may find themselves dealing with data manipulation and querying tasks similar to those faced while working with traditional database systems. But NewSQL programmers will be doing these tasks at a much larger, or shall we say, at a more ‘distributed’ scale. In conclusion When it comes to solving a particular problem related to data management, it’s often said that 80% of the solution comes down to selecting the right tool, and 20% is about understanding the problem at hand! In order to choose the right database system for your organization, you must ask yourself these two questions: What is the nature of the data you will work with? What are you willing to trade-off? In other words, how important are factors such as the scalability and performance of the database system? For example, if you primarily work with mostly transactional data with a priority on high performance and high scalability, then NewSQL databases might fit your bill just perfectly. If you’re going to work with volatile data, NewSQL might help you there as well, however, there are better NoSQL solutions to tackle your data problem. As we have seen earlier, NewSQL databases have been designed to combine the advantages and power of both relational and NoSQL systems. It is important to know that NewSQL databases are not designed to replace either NoSQL or SQL relational models. They are rather intentionally-built alternatives for data processing, which mask the flaws and shortcomings of both relational and nonrelational database systems. The ultimate goal of NewSQL is to deliver a high performance, highly available solution to handle modern data, without compromising on data consistency and high-speed transaction capabilities.
Read more
  • 0
  • 0
  • 25031

article-image-why-should-enterprises-use-splunk
Sunith Shetty
25 Jul 2018
4 min read
Save for later

Why should enterprises use Splunk?

Sunith Shetty
25 Jul 2018
4 min read
Splunk is a multinational software company that offers its core platform, Splunk Enterprise, as well as many related offerings built on the Splunk platform. The platform helps a wide variety of organizational personas, such as analysts, operators, developers, testers, managers, and executives. They get analytical insights from machine-created data. It collects, stores, and provides powerful analytical capabilities, enabling organizations to act on often powerful insights derived from this data. The Splunk Enterprise platform was built with IT operations in mind. When companies had IT infrastructure problems, troubleshooting and solving problems was immensely difficult, complicated, and manual. It was built to collect and make log files from IT systems searchable and accessible. It is commonly used for information security and development operations, as well as more advanced use cases for custom machines, Internet of Things, and mobile devices. Most organizations will start using Splunk in one of three areas: IT operations management, information security, or development operations (DevOps). In today's post, we will understand the thoughts, concepts, and ideas to apply Splunk to an organization level. This article is an excerpt from a book written by J-P Contreras, Erickson Delgado and Betsy Page Sigman titled Splunk 7 Essentials, Third Edition. IT operations IT operations have moved from predominantly being a cost center to also being a revenue center. Today, many of the world's oldest companies also make money based on IT services and/or systems. As a result, the delivery of these IT services must be monitored and, ideally, proactively remedied before failures occur. Ensuring that hardware such as servers, storage, and network devices are functioning properly via their log data is important. Organizations can also log and monitor mobile and browser-based software applications for any issues from software. Ultimately, organizations will want to correlate these sets of data together to get a complete picture of IT Health. In this regard, Splunk takes the expertise accumulated over the years and offers a paid-for application known as IT Server Intelligence (ITSI) to help give companies a framework for tackling large IT environments. Complicating matters for many traditional organizations is the use of Cloud computing technologies, which now drive log captured from both internally and externally hosted systems. Cybersecurity With the relentless focus in today's world on cybersecurity, there is a good chance your organization will need a tool such as Splunk to address a wide variety of Information Security needs as well. It acts as a log data consolidation and reporting engine, capturing essential security-related log data from devices and software, such as vulnerability scanners, phishing prevention, firewalls, and user management and behavior, just to name a few. Companies need to ensure they are protected from external as well as internal threats, and as a result offer the paid-for applications enterprise security and User behavior analytics (UBA). Similar to ITSI, these applications deliver frameworks to help companies meet their specific requirements in these areas. In addition to cyber-security to protect the business, often companies will have to comply with, and audit against, specific security standards, which can be industry-related, such as PCI compliance of financial transactions; customer-related, such as National Institute of Standards and Technologies (NIST) requirements in working with the the US government; or data privacy-related, such as the Health Insurance Portability and Accountability Act (HIPAA) or the European Union's General Data Protection Regulation (GPDR). Software development and support operations Commonly referred to as DevOps, Splunk's ability to ingest and correlate data from many sources solves many challenges faced in software development, testing, and release cycles. Using Splunk will help teams provide higher quality software more efficiently. Then, with the controls into the software in place, it will provide visibility into released software, its use and user behavior changes, intended or not. This set of use cases is particularly applicable to organizations that develop their own software. Internet of Things Many organizations today are looking to build upon the converging trends in computing, mobility and wireless communications and data to capture data from more and more devices. Examples can include data captured from sensors placed on machinery such as wind turbines, trains, sensors, heating, and cooling systems. These sensors provide access to the data they capture in standard formats such as JavaScript Object Notation (JSON) through application programming interfaces (APIs). To summarize, we saw how Splunk can be used at an organizational level for IT operations, cybersecurity, software development and support and the IoTs. To know more about how Splunk can be used to make informed decisions in areas such as IT operations, information security, and the Internet of Things., do checkout this book Splunk 7 Essentials, Third Edition. Create a data model in Splunk to enable interactive reports and dashboards Splunk leverages AI in its monitoring tools Splunk Industrial Asset Intelligence (Splunk IAI) targets Industrial IoT marketplace
Read more
  • 0
  • 0
  • 24691

article-image-deploy-self-service-business-intelligence-qlik-sense
Amey Varangaonkar
31 May 2018
7 min read
Save for later

Best practices for deploying self-service BI with Qlik Sense

Amey Varangaonkar
31 May 2018
7 min read
As part of a successful deployment of Qlik Sense, it is important IT recognizes self-service Business Intelligence to have its own dynamics and adoption rules. The various use cases and subsequent user groups thus need to be assessed and captured. Governance should always be present but power users should never get the feeling that they are restricted. Once they are won over, the rest of the traction and the adoption of other user types is very easy. In this article, we will look at the most important points to keep in mind while deploying self-service with Qlik Sense. The following excerpt is taken from the book Mastering Qlik Sense, authored by Martin Mahler and Juan Ignacio Vitantonio. This book demonstrates useful techniques to design useful and highly profitable Business Intelligence solutions using Qlik Sense. Here's the list of points to be kept in mind: Qlik Sense is not QlikView Not even nearly. The biggest challenge and fallacy is that the organization was sold, by Qlik or someone else, just the next version of the tool. It did not help at all that Qlik itself was working for years on Qlik Sense under the initial product name Qlik.Next. Whatever you are being told, however, it is being sold to you, Qlik Sense is at best the cousin of QlikView. Same family, but no blood relation. Thinking otherwise sets the wrong expectation so the business gives the wrong message to stakeholders and does not raise awareness to IT that self-service BI cannot be deployed in the same fashion as guided analytics, QlikView in this case. Disappointment is imminent when stakeholders realize Qlik Sense cannot replicate their QlikView dashboards. Simply installing Qlik Sense does not create a self-service BI environment Installing Qlik Sense and giving users access to the tool is a start but there is more to it than simply installing it. The infrastructure requires design and planning, data quality processing, data collection, and determining who intends to use the platform to consume what type of data. If data is not available and accessible to the user, data analytics serve no purpose. Make sure a data warehouse or similar is in place and the business has a use case for self-service data analytics. A good indicator for this is when the business or project works with a lot of data, and there are business users who have lots of Excel spreadsheets lying around analyzing it in different ways. That’s your best case candidate for Qlik Sense. IT to monitor Qlik Sense environment rather control IT needs to unlearn to learn new things and the same applies when it comes to deploying self-service. Create a framework with guidelines and principles and monitor that users are following it, rather than limiting them in their capabilities. This framework needs to have the input of the users as well and to be elastic. Also, not many IT professionals agree with giving away too much power to the user in the development process, believing this leads to chaos and anarchy. While the risk is there, this fear needs to be overcome. Users love data analytics, and they are keen to get the help of IT to create the most valuable dashboard possible and ensure it will be well received by a wide audience. Identifying key users and user groups is crucial For a strong adoption of the tool, IT needs to prepare the environment and identify the key power users in the organization and to win them over to using the technology. It is important they are intensively supported, especially in the beginning, and they are allowed to drive how the technology should be used rather than having principles imposed on them. Governance should always be present but power users should never get the feeling they are restricted by it. Because once they are won over, the rest of the traction and the adoption of other user types is very easy. Qlik Sense sells well–do a lot of demos Data analytics, compelling visualizations, and the interactivity of Qlik Sense is something almost everyone is interested in. The business wants to see its own data aggregated and distilled in a cool and glossy dashboard. Utilize the momentum and do as many demos as you can to win advocates of the technology and promote a consciousness of becoming a data-driven culture in the organization. Even the simplest Qlik Sense dashboards amaze people and boost their creativity for use cases where data analytics in their area could apply and create value. Promote collaboration Sharing is caring. This not only applies to insights, which naturally are shared with the excitement of having found out something new and valuable, but also to how the new insight has been derived. People keep their secrets on the approach and methodology to themselves, but this is counterproductive. It is important that applications, visualizations, and dashboards created with Qlik Sense are shared and demonstrated to other Qlik Sense users as frequently as possible. This not only promotes a data-driven culture but also encourages the collaboration of users and teams across various business functions, which would not have happened otherwise. They could either be sharing knowledge, tips, and tricks or even realizing they look at the same slices of data and could create additional value by connecting them together. Market the success of Qlik Sense within the organization If Qlik Sense has had a successful achievement in a project, tell others about it. Create a success story and propose doing demos of the dashboard and its analytics. IT has been historically very bad in promoting their work, which is counterproductive. Data analytics creates value and there is nothing embarrassing about boasting about its success; as Muhammad Ali suggested, it’s not bragging if it’s true. Introduce guidelines on design and terminology Avoiding the pitfalls of having multiple different-looking dashboards by promoting a consistent branding look across all Qlik Sense dashboards and applications, including terminology and best practices. Ensure the document is easily accessible to all users. Also, create predesigned templates with some sample sheets so the users duplicate them and modify them to their liking and extend them, applying the same design. Protect less experienced users from complexities Don’t overwhelm users if they have never developed in their life. Approach less technically savvy users in a different way by providing them with sample data and sample templates, including a library of predefined visualizations, dimensions, or measures (so-called Master Key Items). Be aware that what is intuitive to Qlik professionals or power users is not necessarily intuitive to other users – be patient and appreciative of their feedback, and try to understand how a typical business user might think. For a strong adoption of the tool, IT needs to prepare the environment and identify the key power users in the organization and win them over to using the technology. It is important they are intensively supported, especially in the beginning, and they are allowed to drive how the technology should be used rather than having principles imposed on them. If you found the excerpt useful, make sure you check out the book Mastering Qlik Sense to learn more of these techniques on efficient Business Intelligence using Qlik Sense. Read more How Qlik Sense is driving self-service Business Intelligence Overview of a Qlik Sense® Application’s Life Cycle What we learned from Qlik Qonnections 2018
Read more
  • 0
  • 0
  • 23904
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime
article-image-8-ways-artificial-intelligence-can-improve-devops
Prasad Ramesh
01 Sep 2018
6 min read
Save for later

8 ways Artificial Intelligence can improve DevOps

Prasad Ramesh
01 Sep 2018
6 min read
DevOps combines development and operations in an agile manner. ITOps refers to network infrastructure, computer operations, and device management. AIOps is artificial intelligence applied to ITOps, a term coined by Gartner. Makes us wonder what AI applied to DevOps would look like. Currently, there are some problem areas in DevOps that mainly revolve around data. Namely, accessing the large pool of data, taking actions on it, managing alerts etc. Moreover, there are errors caused by human intervention. AI works heavily with data and can help improve DevOps in numerous ways. Before we get into how AI can improve DevOps, let’s take a look at some of the problem areas in DevOps today. The trouble with DevOps Human errors: When testing or deployment is performed manually and there is an error, it is hard to repeat and fix. Many a time, the software development is outsourced in companies. In such cases, there is lack of coordination between the dev and ops teams. Environment inconsistency: Software functionality breaks when the code moves to different environments as each environment has different configurations. Teams can run around wasting a lot of time due to bugs when the software works fine on one environment but not on the other. Change management: Many companies have change management processes well in place, but they are outdated for DevOps. The time taken for reviews, passing a new module etc is manual and proves to be a bottleneck. Changes happen frequently in DevOps and the functioning suffers due to old processes. Monitoring: Monitoring is key to ensure smooth functioning in Agile. Many companies do not have the expertise to monitor the pipeline and infrastructure. Moreover monitoring only the infrastructure is not enough, there also needs to be monitoring of application performance, solutions need to be logged and analytics need to be tracked. Now let’s take a look at 8 ways AI can improve DevOps given the above context. 1. Better data access One of the most critical issues faced by DevOps teams is the lack of unregulated access to data. There is also a large amount of data, while the teams rarely view all of the data and focus on the outliers. The outliers only work as an indicator but do not give robust information. Artificial intelligence can compile and organize data from multiple sources for repeated use. Organized data is much easier to access and understand than heaps of raw data. This will help in predictive analysis and eventually a better decision making process. This is very important and enables many other ways listed below. 2. Superior implementation efficiency Artificially intelligent systems can work with minimal or no human intervention. Currently, a rules-based environment managed by humans is followed in DevOps teams. AI can transform this into self governed systems to greatly improve operational efficiency. There are limitations to the volume and complexity of analysis a human can perform. Given the large volumes of data to be analyzed and processed, AI systems being good at it, can set optimal rules to maximize operational efficiencies. 3. Root cause analysis Conducting root cause analysis is very important to fix an issue permanently. Not getting to the root cause allows for the cause to persist and affect other areas further down the line. Often, engineers don’t investigate failures in depth and are more focused on getting the release out. This is not surprising given the limited amount of time they have to work with. If fixing a superficial area gets things working, the root cause is not found. AI can take all data into account and see patterns between activity and cause to find the root cause of failure. 4 Automation Complete automation is a problem in DevOps, many tasks in DevOps are routine and need to be done by humans. An AI model can automate these repeatable tasks and speed up the process significantly. A well-trained model increases the scope of complexity of the tasks that can be automated by machines. AI can help achieve least human intervention so that developers can focus on more complex interactive problems. Complete automation also allows the errors to be reproduced and fixed promptly. 5 Reduce Operational Complexity AI can be used to simplify operations by providing a unified view. An engineer can view all the alerts and relevant data produced by the tools in a single place. This improves the current scenario where engineers have to switch between different tools to manually analyze and correlate data. Alert prioritization, root cause analysis, evaluating unusual behavior are complex time consuming tasks that depend on data. An organized singular view can greatly benefit in looking up data when required. “AI and machine learning makes it possible to get a high-level view of the tool-chain, but at the same time zoom in when it is required.” -SignifAI 6 Predicting failures A critical failure in a particular tool/area in DevOps can cripple the process and delay cycles. With enough data, machine learning models can predict when an error can occur. This goes beyond simple predictions. If an occurred fault is known to produce certain readings, AI can read patterns and predict the signs failure. AI can see indicators that humans may not be able to. As such early failure prediction and notification enable the team to fix it before it can affect the software development life cycle (SDLC). 7 Optimizing a specific metric AI can work towards solutions where the uptime is maximized. An adaptive machine learning system can learn how the system works and improve it. Improving could mean tweaking a specific metric in the workflow for optimized performance. Configurations can be changed by AI for optimal performance as required during different production phases. Real-time analysis plays a big part in this. 8 Managing Alerts DevOps systems can be flooded with alerts which are hard for humans to read and act upon. AI can analyze these alerts in real-time and categorize them. Assigning priority to alerts helps teams towards work on fixing them rather than going through a long list of alerts. The alerts can simply be tagged by a common ID for specific areas or AI can be trained for classifying good and bad alerts. Prioritizing alerts in such a way that flaws are shown first to be fixed will help smooth functioning. Conclusion As we saw, most of these areas depend heavily on data. So getting the system right to enhance data accessibility is the first step to take. Predictions work better when data is organized, performing root cause analysis is also easier. Automation can repeat mundane tasks and allow engineers to focus on more interactive problems that machines cannot handle. With machine learning, the overall operation efficiency, simplicity, and speed can be improved for smooth functioning of DevOps teams. Why Agile, DevOps and Continuous Integration are here to stay: Interview with Nikhil Pathania, DevOps practitioner Top 7 DevOps tools in 2018 GitLab’s new DevOps solution
Read more
  • 0
  • 0
  • 23893

article-image-self-service-business-intelligence-qlik-sense-users
Amey Varangaonkar
29 May 2018
7 min read
Save for later

Four self-service business intelligence user types in Qlik Sense

Amey Varangaonkar
29 May 2018
7 min read
With the introduction of self-service to BI, there is segmentation at various levels and breaths on how self-service is conducted and to what extent. There are, quite frankly, different user types that differ from each other in level of interest, technical expertise, and the way in which they consume data. While each user will almost be unique in the way they use self-service, the user base can be divided into four different groups. In this article, we take a look at the four types of users in self-service business intelligence model. The following excerpt is taken from the book Mastering Qlik Sense, authored by Martin Mahler and Juan Ignacio Vitantonio. This book presents expert techniques to design and deploy enterprise-grade Business Intelligence solutions for your business, by leveraging the power of Qlik Sense. Power Users or Data Champions Power users are the most tech-savvy business users, who show a great interest in self-service BI. They produce and build dashboards themselves and know how to load data and process it to create a logical data model. They tend to be self-learning and carry a hybrid set of skills, usually a mixture of business knowledge and some advanced technical skills. This user group is often frustrated with existing reporting or BI solutions and finds IT inadequate in delivering the same. As a result, especially in the past, they take away data dumps from IT solutions and create their own dashboards in Excel, using advanced skills such as VBA, Visual Basic for Applications. They generally like to participate in the development process but have been unable to do so due to governance rules and a strict old-school separation of IT from the business. Self-service BI is addressing this group in particular, and identifying those users is key in reaching adoption within an organization. Within an established self-service environment, power users generally participate in committees revolving around the technical environments and represent the business interest. They also develop the bulk of the first versions of the apps, which, as part of a naturally evolving process, are then handed over to more experienced IT for them to be polished and optimized. Power users advocate the self-service BI technology and often not only demo the insights and information they achieved to extract from their data, but also the efficiency and timeliness of doing so. At the same time, they also serve as the first point of contact for other users and consumers when it comes to questions about their apps and dashboards. Sometimes they also participate in a technical advisory capacity on whether other projects are feasible to be implemented using the same technology. Within a self-service BI environment, it is safe to say that those power users are the pillars of a successful adoption. Business Users or Data Visualizers Users are frequent users of data analytics, with the main goal to extract value from the data they are presented with. They represent the group of the user base which is interested in conducting data analysis and data discovery to better understand their business in order to make better-informed decisions. Presentation and ease of use of the application are key to this type of user group and they are less interested in building new analytics themselves. That being said, some form of creating new charts and loading data is sometimes still of interest to them, albeit on a very basic level. Timeliness, the relevance of data, and the user experience are most relevant to them. They are the ones who are slicing and dicing the data and drilling down into dimensions, and who are keen to click around in the app to obtain valuable information. Usually, a group of users belong to the same department and have a power user overseeing them with regard to questions but also in receiving feedback on how the dashboard can be improved even more. Their interaction with IT is mostly limited to requesting access and resolving unexpected technical errors. Consumers or Data Readers Consumers usually form the largest user group of a self-service BI analytics solution. They are the end recipients of the insights and data analytics that have been produced and, normally, are only interested in distilled information which is presented to them in a digested form. They are usually the kind of users who are happy with a report, either digital or in printed form, which summarizes highlights and lowlights in a few pages, requiring no interaction at all. Also, they are most sensitive to the timeliness and availability of their reports. While usually the largest audience, at the same time this user group leverages the self-service capabilities of a BI tool the least. This poses a licensing challenge, as those users don’t take full advantage of the functionality on offer, but are costing the full amount in order to access the reports. It is therefore not uncommon to assign this type of user group a bucket of login access passes or not give them access to the self-service BI platform at all and give them the information they need in (digitally) printed format or within presentations, prepared by users. IT or Data Overseers IT represents the technical user group within this context, who sit in the background and develop and manage the framework within which the self-service BI solution operates. They are the backbone of the deployment and ensure the environment is set up correctly to cater for the various use cases required by the above-described user groups. At the same time, they ensure a security policy is in place and maintained and they introduce a governance framework for deployment, data quality, and best practices. They are in effect responsible for overseeing the power users and helping them with technical questions, but at the same time ensuring terms and definition as well as the look and feel is consistent and maintained across all apps. With self-service BI, IT plays a lesser role in actually developing the dashboards but assumes a more mentoring position, where training, consultation, and advisory in best practices are conducted. While working closely with power users, IT also provides technical support to users and liaises with the IT infrastructure to ensure the server infrastructure is fit for purpose and up and running to serve the users. This also includes upgrading the platform where required and enriching it with additional functionality if and when available. Bringing them together The previous four groups can be distinguished within a typical enterprise environment; however, this is not to say hybrid or fewer user groups are not viable models for self-service BI. It is an evolutionary process in how an organization adapts self-service data analytics with a lot of dependencies on available skills, competing established solutions, culture, and appetite on new technologies. It usually begins with IT being the first users in a newly deployed self-service environment, not only setting up the infrastructure but also developing the first apps for a couple of consumers. Power users then follow up; generally, they are the business sponsors themselves who are often big fans of data analytics, modifying the app to their liking and promoting it to their users. The user base emerges with the success of the solution, where analytics are integrated into their business as the usual process. The last group, the consumers, is mostly the last type of user group that is established, which more often than not doesn’t have actual access to the platform itself, but rather receives printouts, email summaries with screenshots, or PowerPoint presentations. Due to licensing cost and the size of the consumer audience, it is not always easy to give them access to the self-service platform; hence, most of the time, an automated and streamlined PDF printing process is the most elegant solution to cater to this type of user group. At the same time, the size of the deployment also determines the number of various user groups. In small enterprise environments, it will be mostly power users and IT who will be using self-service. This greatly simplifies the approach as well as the setup considerations. If you found the above excerpt useful, make sure you check out the book Mastering Qlik Sense to learn helpful tips and tricks to perform effective Business Intelligence using Qlik Sense. Read more: How Qlik Sense is driving self-service Business Intelligence What we learned from Qlik Qonnections 2018 How self-service analytics is changing modern-day businesses
Read more
  • 0
  • 0
  • 23125

article-image-write-python-code-or-pythonic-code
Aaron Lazar
08 Aug 2018
5 min read
Save for later

Do you write Python Code or Pythonic Code?

Aaron Lazar
08 Aug 2018
5 min read
If you’re new to Programming, and Python in particular, you might have heard the term Pythonic being brought up at tech conferences, meetups and even at your own office. You might have also wondered why the term and whether they’re just talking about writing Python code. Here we’re going to understand what the term Pythonic means and why you should be interested in learning how to not just write Python code, rather write Pythonic code. What does Pythonic mean? When people talk about pythonic code, they mean that the code uses Python idioms well, that it’s natural or displays fluency in the language. In other words, it means the most widely adopted idioms that are adopted by the Python community. If someone said you are writing un-pythonic code, they might actually mean that you are attempting to write Java/C++ code in Python, disregarding the Python idioms and performing a rough transcription rather than an idiomatic translation from the other language. Okay, now that you have a theoretical idea of what Pythonic (and unpythonic) means, let’s have a look at some Pythonic code in practice. Writing Pythonic Code Before we get into some examples, you might be wondering if there’s a defined way/method of writing Pythonic code. Well, there is, and it’s called PEP 8. It’s the official style guide for Python. Example #1 x=[1, 2, 3, 4, 5, 6] result = [] for idx in range(len(x)); result.append(x[idx] * 2) result Output: [2, 4, 6, 8, 10, 12] Consider the above code, where you’re trying to multiply some elements, “x” by 2. So, what we did here was, we created an empty list to store the results. We would then append the solution of the computation into the result. The result now contains a function which is 2 multiplied by each of the elements. Now, if you were to write the same code in a Pythonic way, you might want to simply use list comprehensions. Here’s how: x=[1, 2, 3, 4, 5, 6] [(element * 2) for element in x] Output: [2, 4, 6, 8, 10] You might have noticed, we skipped the entire for loop! Example #2 Let’s make the previous example a bit more complex, and place a condition that the elements should be multiplied by 2 only if they are even. x=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10] result = [] for idx in range(len(x)); if x[idx] % 2 == 0; result.append(x[idx] * 2) else; result.append(x[idx]) result Output: [1, 4, 3, 8, 5, 12, 7, 16, 9, 20] We’ve actually created an if else statement to solve this problem, but there is a simpler way of doing things the Pythonic way. [(element * 2 if element % 2 == 0 else element) for element in x] Output: [1, 4, 3, 8, 5, 12, 7, 16, 9, 20] If you notice what we’ve done here, apart from skipping multiple lines of code, is that we used the if-else statement in the same sentence. Now, if you wanted to perform filtering, you could do this: x=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10] [element * 2 for element in x if element % 2 == 0] Output: [4, 8, 12, 16, 20] What we’ve done here is put the if statement after the for declaration, and Voila! We’ve achieved filtering. If you’re using a nice IDE like Jupyter Notebooks or PyCharm, they will help you format your code as per the PEP 8 suggestions. Why should you write Pythonic code? Well firstly, you’re saving loads of time writing humongous piles of cowdung code, so you’re obviously becoming a smarter and more productive programmer. Python is a pretty slow language, and when you’re trying to do something in Python, which is acquired from another language like Java or C++, you’re going to worsen things. With idiomatic, Pythonic code, you’re improving the speed of your programs. Moreover, idiomatic code is far easier to comprehend and understand for other developers who are working on the same code. It helps a great deal when you’re trying to refactor someone else’s code. Fearing Pythonic idioms Well, I don’t mean the idioms themselves are scary. Rather, quite a few developers and organisations have begun discriminating on the basis of whether someone can or cannot write Pythonic code. This is wrong, because, at the end of the day, though the PEP 8 exists, the idea of the term Pythonic is different for different people. To some it might mean picking up a new style guide and improving the way you code. To others, it might mean being succinct and not repeating themselves. It’s time we stopped judging people on whether they can or can’t write Pythonic code and instead, we should appreciate when someone is able to present readable, easily maintainable and succinct code. If you find them writing a bit of clumsy code, you can choose to talk to them about improving their design considerations. And the world will be a better place! If you’re interested in learning how to write more succinct and concise Python code, check out these resources: Learning Python Design Patterns - Second Edition Python Design Patterns [Video] Python Tips, Tricks and Techniques [Video]    
Read more
  • 0
  • 2
  • 22489

article-image-top-4-business-intelligence-tools
Ed Bowkett
04 Dec 2014
4 min read
Save for later

Top 4 Business Intelligence Tools

Ed Bowkett
04 Dec 2014
4 min read
With the boom of data analytics, Business Intelligence has taken something of a front stage in recent years, and as a result, a number of Business Intelligence (BI) tools have appeared. This allows a business to obtain a reliable set of data, faster and easier, and to set business objectives. This will be a list of the more prominent tools and will list advantages and disadvantages of each. Pentaho Pentaho was founded in 2004 and offers a suite, among others, of open source BI applications under the name, Pentaho Business Analytics. It has two suites, enterprise and community. It allows easy access to data and even easier ways of visualizing this data, from a variety of different sources including Excel and Hadoop and it covers almost every platform ranging from mobile, Android and iPhone, through to Windows and even Web-based. However with the pros, there are cons, which include the Pentaho Metadata Editor in Pentaho, which is difficult to understand, and the documentation provided offers few solutions for this tool (which is a key component). Also, compared to other tools, which we will mention below, the advanced analytics in Pentaho need improving. However, given that it is open source, there is continual improvement. Tableau Founded in 2003, Tableau also offers a range of suites, focusing on three products: Desktop, Server, and Public. Some benefits of using Tableau over other products include ease of use and a pretty simple UI involving drag and drop tools, which allows pretty much everyone to use it. Creating a highly interactive dashboard with various sources to obtain your data from is simple and quick. To sum up, Tableau is fast. Incredibly fast! There are relatively few cons when it comes to Tableau, but some automated features you would usually expect in other suites aren’t offered for most of the processes and uses here. Jaspersoft As well as being another suite that is open source, Jaspersoft ships with a number of data visualization, data integration, and reporting tools. Added to the small licensing cost, Jaspersoft is justifiably one of the leaders in this area. It can be used with a variety of databases including Cassandra, CouchDB, MongoDB, Neo4j, and Riak. Other benefits include ease of installation and the functionality of the tools in Jaspersoft is better than most competitors on the market. However, the documentation has been claimed to have been lacking in helping customers dive deeper into Jaspersoft, and if you do customize it the customer service can no longer assist you if it breaks. However, given the functionality/ability to extend it, these cons seem minor. Qlikview Qlikview is one of the oldest Business Intelligence software tools in the market, having been around since 1993, it has multiple features, and as a result, many pros and cons that include ones that I have mentioned for previous suites. Some advantages of Qlikview are that it takes a very small amount of time to implement and it’s incredibly quick; quicker than Tableau in this regard! It also has 64-bit in-memory, which is among the best in the market. Qlikview also has good data mining tools, good features (having been in the market for a long time), and a visualization function. These aspects make it so much easier to deal with than others on the market. The learning curve is relatively small. Some cons in relation to Qlikview include that while Qlikview is easy to use, Tableau is seen as the better suite to use to analyze data in depth. Qlikview also has difficulties integrating map data, which other BI tools are better at doing. This list is not definitive! It lays out some open source tools that companies and individuals can use to help them analyze data to prepare business performance KPIs. There are other tools that are used by businesses including Microsoft BI tools, Cognos, MicroStrategy, and Oracle Hyperion. I’ve chosen to explore some BI tools that are quick to use out of the box and are incredibly popular and expanding in usage.
Read more
  • 0
  • 0
  • 21625
article-image-famous-gang-of-four-design-patterns
Sugandha Lahoti
10 Jul 2018
14 min read
Save for later

Meet the famous 'Gang of Four' design patterns

Sugandha Lahoti
10 Jul 2018
14 min read
A design pattern is a reusable solution to a recurring problem in software design. It is not a finished piece of code but a template that helps to solve a particular problem or family of problems. In this article, we will talk about the Gang of Four design patterns. The gang of four, authors Erich Gamma, Richard Helm, Ralph Johnson and John Vlissides, initiated the concept of Design Pattern in Software development. These authors are collectively known as Gang of Four (GOF). We are going to focus on the design patterns from the Scala point of view. All different design patterns can be grouped into the following types: Creational Structural Behavioral These three groups contain the famous Gang of Four design patterns.  In the next few subsections, we will explain the main characteristics of the listed groups and briefly present the actual design patterns that fall under them. This article is an excerpt from Scala Design Patterns - Second Edition by Ivan Nikolov. In this book, you will learn how to write efficient, clean, and reusable code with Scala. Creational design patterns The creational design patterns deal with object creation mechanisms. Their purpose is to create objects in a way that is suitable to the current situation, which could lead to unnecessary complexity and the need for extra knowledge if they were not there. The main ideas behind the creational design patterns are as follows: Knowledge encapsulation about the concrete classes Hiding details about the actual creation and how objects are combined We will be focusing on the following creational design patterns in this article: The abstract factory design pattern The factory method design pattern The lazy initialization design pattern The singleton design pattern The object pool design pattern The builder design pattern The prototype design pattern The following few sections give a brief definition of what these patterns are. The abstract factory design pattern This is used to encapsulate a group of individual factories that have a common theme. When used, the developer creates a specific implementation of the abstract factory and uses its methods in the same way as in the factory design pattern to create objects. It can be thought of as another layer of abstraction that helps to instantiate classes. The factory method design pattern This design pattern deals with the creation of objects without explicitly specifying the actual class that the instance will have—it could be something that is decided at runtime based on many factors. Some of these factors can include operating systems, different data types, or input parameters. It gives developers the peace of mind of just calling a method rather than invoking a concrete constructor. The lazy initialization design pattern This design pattern is an approach to delay the creation of an object or the evaluation of a value until the first time it is needed. It is much more simplified in Scala than it is in an object-oriented language such as Java. The singleton design pattern This design pattern restricts the creation of a specific class to just one object. If more than one class in the application tries to use such an instance, then this same instance is returned for everyone. This is another design pattern that can be easily achieved with the use of basic Scala features. The object pool design pattern This design pattern uses a pool of objects that are already instantiated and ready for use. Whenever someone requires an object from the pool, it is returned, and after the user is finished with it, it puts it back into the pool manually or automatically. A common use for pools are database connections, which generally are expensive to create; hence, they are created once and then served to the application on request. The builder design pattern The builder design pattern is extremely useful for objects with many possible constructor parameters that would otherwise require developers to create many overrides for the different scenarios an object could be created in. This is different to the factory design pattern, which aims to enable polymorphism. Many of the modern libraries today employ this design pattern. As we will see later, Scala can achieve this pattern really easily. The prototype design pattern This design pattern allows object creation using a clone() method from an already created instance. It can be used in cases when a specific resource is expensive to create or when the abstract factory pattern is not desired. Structural design patterns Structural design patterns exist in order to help establish the relationships between different entities in order to form larger structures. They define how each component should be structured so that it has very flexible interconnecting modules that can work together in a larger system. The main features of structural design patterns include the following: The use of the composition to combine the implementations of multiple objects Help build a large system made of various components by maintaining a high level of flexibility In this article, we will focus on the following structural design patterns: The adapter design pattern The decorator design pattern The bridge design pattern The composite design pattern The facade design pattern The flyweight design pattern The proxy design pattern The next subsections will put some light on what these patterns are about. The adapter design pattern The adapter design pattern allows the interface of an existing class to be used from another interface. Imagine that there is a client who expects your class to expose a doWork() method. You might have the implementation ready in another class, but the method is called differently and is incompatible. It might require extra parameters too. This could also be a library that the developer doesn't have access to for modifications. This is where the adapter can help by wrapping the functionality and exposing the required methods. The adapter is useful for integrating the existing components. In Scala, the adapter design pattern can be easily achieved using implicit classes. The decorator design pattern Decorators are a flexible alternative to sub classing. They allow developers to extend the functionality of an object without affecting other instances of the same class. This is achieved by wrapping an object of the extended class into one that extends the same class and overrides the methods whose functionality is supposed to be changed. Decorators in Scala can be built much more easily using another design pattern called stackable traits. The bridge design pattern The purpose of the bridge design pattern is to decouple an abstraction from its implementation so that the two can vary independently. It is useful when the class and its functionality vary a lot. The bridge reminds us of the adapter pattern, but the difference is that the adapter pattern is used when something is already there and you cannot change it, while the bridge design pattern is used when things are being built. It helps us to avoid ending up with multiple concrete classes that will be exposed to the client. You will get a clearer understanding when we delve deeper into the topic, but for now, let's imagine that we want to have a FileReader class that supports multiple different platforms. The bridge will help us end up with FileReader, which will use a different implementation, depending on the platform. In Scala, we can use self-types in order to implement a bridge design pattern. The composite design pattern The composite is a partitioning design pattern that represents a group of objects that are to be treated as only one object. It allows developers to treat individual objects and compositions uniformly and to build complex hierarchies without complicating the source code. An example of composite could be a tree structure where a node can contain other nodes, and so on. The facade design pattern The purpose of the facade design pattern is to hide the complexity of a system and its implementation details by providing the client with a simpler interface to use. This also helps to make the code more readable and to reduce the dependencies of the outside code. It works as a wrapper around the system that is being simplified and, of course, it can be used in conjunction with some of the other design patterns mentioned previously. The flyweight design pattern The flyweight design pattern provides an object that is used to minimize memory usage by sharing it throughout the application. This object should contain as much data as possible. A common example given is a word processor, where each character's graphical representation is shared with the other same characters. The local information then is only the position of the character, which is stored internally. The proxy design pattern The proxy design pattern allows developers to provide an interface to other objects by wrapping them. They can also provide additional functionality, for example, security or thread-safety. Proxies can be used together with the flyweight pattern, where the references to shared objects are wrapped inside proxy objects. Behavioral design patterns Behavioral design patterns increase communication flexibility between objects based on the specific ways they interact with each other. Here, creational patterns mostly describe a moment in time during creation, structural patterns describe a more or less static structure, and behavioral patterns describe a process or flow. They simplify this flow and make it more understandable. The main features of behavioral design patterns are as follows: What is being described is a process or flow The flows are simplified and made understandable They accomplish tasks that would be difficult or impossible to achieve with objects In this article, we will focus our attention on the following behavioral design patterns: The value object design pattern The null object design pattern The strategy design pattern The command design pattern The chain of responsibility design pattern The interpreter design pattern The iterator design pattern The mediator design pattern The memento design pattern The observer design pattern The state design pattern The template method design pattern The visitor design pattern The following subsections will give brief definitions of the aforementioned behavioral design patterns. The value object design pattern Value objects are immutable and their equality is based not on their identity, but on their fields being equal. They can be used as data transfer objects, and they can represent dates, colors, money amounts, numbers, and more. Their immutability makes them really useful in multithreaded programming. The Scala programming language promotes immutability, and value objects are something that naturally occur there. The null object design pattern Null objects represent the absence of a value and they define a neutral behavior. This approach removes the need to check for null references and makes the code much more concise. Scala adds the concept of optional values, which can replace this pattern completely. The strategy design pattern The strategy design pattern allows algorithms to be selected at runtime. It defines a family of interchangeable encapsulated algorithms and exposes a common interface to the client. Which algorithm is chosen could depend on various factors that are determined while the application runs. In Scala, we can simply pass a function as a parameter to a method, and depending on the function, a different action will be performed. The command design pattern This design pattern represents an object that is used to store information about an action that needs to be triggered at a later time. The information includes the following: The method name The owner of the method Parameter values The client then decides which commands need to be executed and when by the invoker. This design pattern can easily be implemented in Scala using the by-name parameters feature of the language. The chain of responsibility design pattern The chain of responsibility is a design pattern where the sender of a request is decoupled from its receiver. This way, it makes it possible for multiple objects to handle the request and to keep logic nicely separated. The receivers form a chain where they pass the request and, if possible, they process it, and if not, they pass it to the next receiver. There are variations where a handler might dispatch the request to multiple other handlers at the same time. This somehow reminds us of function composition, which in Scala can be achieved using the stackable traits design pattern. The interpreter design pattern The interpreter design pattern is based on the ability to characterize a well-known domain with a language with a strict grammar. It defines classes for each grammar rule in order to interpret sentences in the given language. These classes are likely to represent hierarchies as grammar is usually hierarchical as well. Interpreters can be used in different parsers, for example, SQL or other languages. The iterator design pattern The iterator design pattern is when an iterator is used to traverse a container and access its elements. It helps to decouple containers from the algorithms performed on them. What an iterator should provide is sequential access to the elements of an aggregate object without exposing the internal representation of the iterated collection. The mediator design pattern This pattern encapsulates the communication between different classes in an application. Instead of interacting directly with each other, objects communicate through the mediator, which reduces the dependencies between them, lowers the coupling, and makes the overall application easier to read and maintain. The memento design pattern This pattern provides the ability to roll back an object to its previous state. It is implemented with three objects—originator, caretaker, and memento. The originator is the object with the internal state; the caretaker will modify the originator, and a memento is an object that contains the state that the originator returns. The originator knows how to handle a memento in order to restore its previous state. The observer design pattern This design pattern allows the creation of publish/subscribe systems. There is a special object called subject that automatically notifies all the observers when there are any changes in the state. This design pattern is popular in various GUI toolkits and generally where event handling is needed. It is also related to reactive programming, which is enabled by libraries such as Akka. We will see an example of this towards the end of this book. The state design pattern This design pattern is similar to the strategy design pattern, and it uses a state object to encapsulate different behavior for the same object. It improves the code's readability and maintainability by avoiding the use of large conditional statements. The template method design pattern This design pattern defines the skeleton of an algorithm in a method and then passes some of the actual steps to the subclasses. It allows developers to alter some of the steps of an algorithm without having to modify its structure. An example of this could be a method in an abstract class that calls other abstract methods, which will be defined in the children. The visitor design pattern The visitor design pattern represents an operation to be performed on the elements of an object structure. It allows developers to define a new operation without changing the original classes. Scala can minimize the verbosity of this pattern compared to the pure object-oriented way of implementing it by passing functions to methods. Choosing a design pattern As we already saw, there are a huge number of design patterns. In many cases, they are suitable to be used in combinations as well. Unfortunately, there is no definite answer regarding how to choose the concept of designing our code. There are many factors that could affect the final decision, and you should ask yourselves the following questions: Is this piece of code going to be fairly static or will it change in the future? Do we have to dynamically decide what algorithms to use? Is our code going to be used by others? Do we have an agreed interface? What libraries are we planning to use, if any? Are there any special performance requirements or limitations? This is by no means an exhaustive list of questions. There is a huge amount of factors that could dictate our decision in how we build our systems. It is, however, really important to have a clear specification, and if something seems missing, it should always be checked first. By now, we have a fair idea about what a design pattern is and how it can affect the way we write our code. We've iterated through the most famous Gang of Four design patterns out there, and we have outlined the main differences between them. To know more on how to incorporate functional patterns effectively in real-life applications, read our book Scala Design Patterns - Second Edition. Implementing 5 Common Design Patterns in JavaScript (ES8) An Introduction to Node.js Design Patterns
Read more
  • 0
  • 0
  • 21310

article-image-why-arc-welder-is-good-choice-to-run-android-apps-on-desktop-using-chrome-browser
Guest Contributor
21 Aug 2019
5 min read
Save for later

Why ARC Welder is a good choice to run Android apps on desktop using the Chrome browser

Guest Contributor
21 Aug 2019
5 min read
Running Android apps on Chrome is a complicated task, especially when you are not using a Chromebook. However, it should be noted that Chrome has an in-built tool (now) that allows users to test Android-based application in the browser, launched by Google in 2015, known as App Runtime for Chrome (ARC) Welder. What is ARC Welder? The ARC Welder tool allows Android applications to run on Google Chrome for Windows, OS X, Linux systems. ARC Welder is basically for app developers who want to test run their Android applications within Chrome OS and confront any runtime errors or bugs. The tool was launched as an experimental concept for developers previously but later was available for download for everyone. Main functions: ARC Welder offers an easy and streamlined method for application testing. At the first step, the user will be required to add the bundle into the existing application menu. Users are provided with the freedom to write to any file or a folder which can be opened via ARC software assistance. Any beginner developer or a user can choose to leave the settings page as they (settings) will be set to default if skipped or left unsaved. Here’s how to run ARC Welder tool for running android application: Download or upgrade to the latest version of Google Chrome browser. Download and run the ARC Welder application from the Google Chrome Store. Add a third-party APK file host. After downloading the APK app file in your laptop/PC, click Open. Select the mode “Phone” and ‘Tablet”--either of which you wish you run the application on. Lastly, click on the "Launch App" button. Points to remember for running ARC Welder on Chrome: ARC Welder tool only works with APK files, which means that in order to get your Android Applications successfully run on your laptop, you will be required to download APK files of the specific application you wish to install on your desktop. You can find APK files from the below mentioned APK databases: APKMirror AndroidAPKsFree AndroidCrew APKPure Points to remember before installing ARC Welder: Only one specific application can be loaded at one single time. On the basis of your application, you will be required to select the portrait/landscape mode manually. Tablet and Phone mode specifications are necessary as they have different outcomes. ARC Welder is based on Android 4.4. This means that users are required to test applications that support Android 4.4 or above. Note: Points 1 and 2 can be considered as limitations of ARC Welder. Pros: Cross-platform as it works on Windows, Linux, Mac and Chrome OS. Developed by Google which means the software will evolve quickly considering the upgrade pace of Android (also developed by Google). Allows application testing in Google Chrome web browser. Cons: Not all Google Play Services are supported by ARC Welder. ARC Welder only supports “ARM” APK format. Keyboard input is spotty. Takes 2-3 minutes to install as compared to other testing applications like BlueStacks (one-click install). No accelerometer simulation. Users are required to choose the “orientation” mode before getting into the detailed interface of ARC Welder. There are competitors of ARC Welder like BlueStacks which is often preferred by a majority of developers due to its one-click install feature. Although ARC Welder gives a much better performance, it still ranks at 7th (BlueStacks stands at 6th). Apart from shortcomings, ARC Welder continues to evolve and secure its faithful following of beginners to expert developers. In the next section, we’ll have a look at the few alternatives to ARC Welder. Few Alternatives: Genymotion - It is an easy to use android emulator for your computer. It works as a virtual machine and enables you to run mobile apps and games on your desktop and laptop efficiently. Andy - It is an operating system that works as an android emulator for your computer. It allows you to open up mobile apps and play mobile games in a version of the Android operating system on your Mac or Windows desktop. BlueStacks - It is a website that has been built to format mobile apps and make them compatible to the desktop computers. It also helps to open ip mobile gaming apps on computers and laptops. MEmu - It is the fastest android emulator that allows you to play mobile games on PC for free. It is known for its performance, and user experience. It supports most of the popular mobile apps and games, and various system configurations. Koplayer - It is a free, one of the best android emulator for PC that supports video recording, multiple accounts, and keyboard. Built on x86 architecture, it is more stable and faster than Bluestacks. Not to mention, it is very interesting to load android apps on chrome browser on your computer and laptop, no matter which operating system you are using. It could be very useful to run android apps on chrome browser when Google play store and Apple app store are prone to exploitation. Although right now we can run a few apps using ARC Welder, one at a time, surely the developers will add more functionality and take this to the next level. So, are you ready to use mobile apps play mobile games on your PC using ARC Welder? If you have any questions, leave in the comment box, we’ll respond back. Author Bio Hilary is a writer, content manager at Androidcrew.com. She loves to share the knowledge and insights she gained along the way with others.    
Read more
  • 0
  • 0
  • 20996

article-image-polycloud-a-better-alternative-to-cloud-agnosticism
Richard Gall
16 May 2018
3 min read
Save for later

Polycloud: a better alternative to cloud agnosticism

Richard Gall
16 May 2018
3 min read
What is polycloud? Polycloud is an emerging cloud strategy that is starting to take hold across a range of organizations. The concept is actually pretty simple: instead of using a single cloud vendor, you use multiple vendors. By doing this, you can develop a customized cloud solution that is suited to your needs. For example, you might use AWS for the bulk of your services and infrastructure, but decide to use Google's cloud for its machine learning capabilities. Polycloud has emerged because of the intensely competitive nature of the cloud space today. All three major vendors - AWS, Azure, and Google Cloud - don't particularly differentiate their products. The core features are pretty much the same across the market. Of course, there are certain subtle differences between each solution, as the example above demonstrates; taking a polycloud approach means you can leverage these differences rather than compromising with your vendor of choice. What's the difference between a polycloud approach and a cloud agnostic approach? You might be thinking that polycloud sounds like cloud agnosticism. And while there are clearly many similarities, the differences between the two are very important. Cloud agnosticism aims for a certain degree of portability across different cloud solutions. This can, of course, be extremely expensive. It also adds a lot of complexity, especially in how you orchestrate deployments across different cloud providers. True, there are times when cloud agnosticism might work for you; if you're not using the services being provided to you, then yes, cloud agnosticism might be the way to go. However, in many (possibly most) cases, cloud agnosticism makes life harder. Polycloud makes it a hell of a lot easier. In fact, it ultimately does what many organizations have been trying to do with a cloud agnostic strategy. It takes the parts you want from each solution and builds it around what you need. Perhaps one of the key benefits of a polycloud approach is that it gives more power back to users. Your strategic thinking is no longer limited to what AWS, Azure or Google offers - you can instead start with your needs and build the solution around that. How quickly is polycloud being adopted? Polycloud first featured in Thoughtworks' Radar in November 2017. At that point it was in the 'assess' stage of Thoughtworks' cycle; this means it's simply worth exploring and investigating in more detail. However, in its May 2018 Radar report, polycloud had moved into the 'trial' phase. This means it is seen as being an approach worth adopting. It will be worth watching the polycloud trend closely over the next few months to see how it evolves. There's a good chance that we'll see it come to replace cloud agnosticism. Equally, it's likely to impact the way AWS, Azure and Google respond. In many ways, the trend a reaction to the way the market has evolved; it may force the big players in the market to evolve what they offer to customers and clients. Read next Serverless computing wars: AWS Lambdas vs Azure Functions How to run Lambda functions on AWS Greengrass
Read more
  • 0
  • 0
  • 20833
article-image-most-commonly-used-java-machine-learning-libraries
Fatema Patrawala
10 Sep 2018
15 min read
Save for later

6 most commonly used Java Machine learning libraries

Fatema Patrawala
10 Sep 2018
15 min read
There are over 70 Java-based open source machine learning projects listed on the MLOSS.org website and probably many more unlisted projects live at university servers, GitHub, or Bitbucket. In this article, we will review the major machine learning libraries and platforms in Java, the kind of problems they can solve, the algorithms they support, and the kind of data they can work with. This article is an excerpt taken from Machine learning in Java, written by Bostjan Kaluza and published by Packt Publishing Ltd. Weka Weka, which is short for Waikato Environment for Knowledge Analysis, is a machine learning library developed at the University of Waikato, New Zealand, and is probably the most well-known Java library. It is a general-purpose library that is able to solve a wide variety of machine learning tasks, such as classification, regression, and clustering. It features a rich graphical user interface, command-line interface, and Java API. You can check out Weka at http://www.cs.waikato.ac.nz/ml/weka/. At the time of writing this book, Weka contains 267 algorithms in total: data pre-processing (82), attribute selection (33), classification and regression (133), clustering (12), and association rules mining (7). Graphical interfaces are well-suited for exploring your data, while Java API allows you to develop new machine learning schemes and use the algorithms in your applications. Weka is distributed under GNU General Public License (GNU GPL), which means that you can copy, distribute, and modify it as long as you track changes in source files and keep it under GNU GPL. You can even distribute it commercially, but you must disclose the source code or obtain a commercial license. In addition to several supported file formats, Weka features its own default data format, ARFF, to describe data by attribute-data pairs. It consists of two parts. The first part contains header, which specifies all the attributes (that is, features) and their type; for instance, nominal, numeric, date, and string. The second part contains data, where each line corresponds to an instance. The last attribute in the header is implicitly considered as the target variable, missing data are marked with a question mark. For example, the Bob instance written in an ARFF file format would be as follows: @RELATION person_dataset @ATTRIBUTE `Name`  STRING @ATTRIBUTE `Height`  NUMERIC @ATTRIBUTE `Eye color`{blue, brown, green} @ATTRIBUTE `Hobbies`  STRING @DATA 'Bob', 185.0, blue, 'climbing, sky diving' 'Anna', 163.0, brown, 'reading' 'Jane', 168.0, ?, ? The file consists of three sections. The first section starts with the @RELATION <String> keyword, specifying the dataset name. The next section starts with the @ATTRIBUTE keyword, followed by the attribute name and type. The available types are STRING, NUMERIC, DATE, and a set of categorical values. The last attribute is implicitly assumed to be the target variable that we want to predict. The last section starts with the @DATA keyword, followed by one instance per line. Instance values are separated by comma and must follow the same order as attributes in the second section. Weka's Java API is organized in the following top-level packages: weka.associations: These are data structures and algorithms for association rules learning, including Apriori, predictive apriori, FilteredAssociator, FP-Growth, Generalized Sequential Patterns (GSP), Hotspot, and Tertius. weka.classifiers: These are supervised learning algorithms, evaluators, and data structures. Thepackage is further split into the following components: weka.classifiers.bayes: This implements Bayesian methods, including naive Bayes, Bayes net, Bayesian logistic regression, and so on weka.classifiers.evaluation: These are supervised evaluation algorithms for nominal and numerical prediction, such as evaluation statistics, confusion matrix, ROC curve, and so on weka.classifiers.functions: These are regression algorithms, including linear regression, isotonic regression, Gaussian processes, support vector machine, multilayer perceptron, voted perceptron, and others weka.classifiers.lazy: These are instance-based algorithms such as k-nearest neighbors, K*, and lazy Bayesian rules weka.classifiers.meta: These are supervised learning meta-algorithms, including AdaBoost, bagging, additive regression, random committee, and so on weka.classifiers.mi: These are multiple-instance learning algorithms, such as citation k-nn, diverse density, MI AdaBoost, and others weka.classifiers.rules: These are decision tables and decision rules based on the separate-and-conquer approach, Ripper, Part, Prism, and so on weka.classifiers.trees: These are various decision trees algorithms, including ID3, C4.5, M5, functional tree, logistic tree, random forest, and so on weka.clusterers: These are clustering algorithms, including k-means, Clope, Cobweb, DBSCAN hierarchical clustering, and farthest. weka.core: These are various utility classes, data presentations, configuration files, and so on. weka.datagenerators: These are data generators for classification, regression, and clustering algorithms. weka.estimators: These are various data distribution estimators for discrete/nominal domains, conditional probability estimations, and so on. weka.experiment: These are a set of classes supporting necessary configuration, datasets, model setups, and statistics to run experiments. weka.filters: These are attribute-based and instance-based selection algorithms for both supervised and unsupervised data preprocessing. weka.gui: These are graphical interface implementing explorer, experimenter, and knowledge flowapplications. Explorer allows you to investigate dataset, algorithms, as well as their parameters, and visualize dataset with scatter plots and other visualizations. Experimenter is used to design batches of experiment, but it can only be used for classification and regression problems. Knowledge flows implements a visual drag-and-drop user interface to build data flows, for example, load data, apply filter, build classifier, and evaluate. Java-ML for machine learning Java machine learning library, or Java-ML, is a collection of machine learning algorithms with a common interface for algorithms of the same type. It only features Java API, therefore, it is primarily aimed at software engineers and programmers. Java-ML contains algorithms for data preprocessing, feature selection, classification, and clustering. In addition, it features several Weka bridges to access Weka's algorithms directly through the Java-ML API. It can be downloaded from http://java-ml.sourceforge.net; where, the latest release was in 2012 (at the time of writing this book). Java-ML is also a general-purpose machine learning library. Compared to Weka, it offers more consistent interfaces and implementations of recent algorithms that are not present in other packages, such as an extensive set of state-of-the-art similarity measures and feature-selection techniques, for example, dynamic time warping, random forest attribute evaluation, and so on. Java-ML is also available under the GNU GPL license. Java-ML supports any type of file as long as it contains one data sample per line and the features are separated by a symbol such as comma, semi-colon, and tab. The library is organized around the following top-level packages: net.sf.javaml.classification: These are classification algorithms, including naive Bayes, random forests, bagging, self-organizing maps, k-nearest neighbors, and so on net.sf.javaml.clustering: These are clustering algorithms such as k-means, self-organizing maps, spatial clustering, Cobweb, AQBC, and others net.sf.javaml.core: These are classes representing instances and datasets net.sf.javaml.distance: These are algorithms that measure instance distance and similarity, for example, Chebyshev distance, cosine distance/similarity, Euclidian distance, Jaccard distance/similarity, Mahalanobis distance, Manhattan distance, Minkowski distance, Pearson correlation coefficient, Spearman's footrule distance, dynamic time wrapping (DTW), and so on net.sf.javaml.featureselection: These are algorithms for feature evaluation, scoring, selection, and ranking, for instance, gain ratio, ReliefF, Kullback-Liebler divergence, symmetrical uncertainty, and so on net.sf.javaml.filter: These are methods for manipulating instances by filtering, removing attributes, setting classes or attribute values, and so on net.sf.javaml.matrix: This implements in-memory or file-based array net.sf.javaml.sampling: This implements sampling algorithms to select a subset of dataset net.sf.javaml.tools: These are utility methods on dataset, instance manipulation, serialization, Weka API interface, and so on net.sf.javaml.utils: These are utility methods for algorithms, for example, statistics, math methods, contingency tables, and others Apache Mahout The Apache Mahout project aims to build a scalable machine learning library. It is built atop scalable, distributed architectures, such as Hadoop, using the MapReduce paradigm, which is an approach for processing and generating large datasets with a parallel, distributed algorithm using a cluster of servers. Mahout features console interface and Java API to scalable algorithms for clustering, classification, and collaborative filtering. It is able to solve three business problems: item recommendation, for example, recommending items such as people who liked this movie also liked…; clustering, for example, of text documents into groups of topically-related documents; and classification, for example, learning which topic to assign to an unlabeled document. Mahout is distributed under a commercially-friendly Apache License, which means that you can use it as long as you keep the Apache license included and display it in your program's copyright notice. Mahout features the following libraries: org.apache.mahout.cf.taste: These are collaborative filtering algorithms based on user-based and item-based collaborative filtering and matrix factorization with ALS org.apache.mahout.classifier: These are in-memory and distributed implementations, includinglogistic regression, naive Bayes, random forest, hidden Markov models (HMM), and multilayer perceptron org.apache.mahout.clustering: These are clustering algorithms such as canopy clustering, k-means, fuzzy k-means, streaming k-means, and spectral clustering org.apache.mahout.common: These are utility methods for algorithms, including distances, MapReduce operations, iterators, and so on org.apache.mahout.driver: This implements a general-purpose driver to run main methods of other classes org.apache.mahout.ep: This is the evolutionary optimization using the recorded-step mutation org.apache.mahout.math: These are various math utility methods and implementations in Hadoop org.apache.mahout.vectorizer: These are classes for data presentation, manipulation, andMapReduce jobs Apache Spark Apache Spark, or simply Spark, is a platform for large-scale data processing builds atop Hadoop, but, in contrast to Mahout, it is not tied to the MapReduce paradigm. Instead, it uses in-memory caches to extract a working set of data, process it, and repeat the query. This is reported to be up to ten times as fast as a Mahout implementation that works directly with disk-stored data. It can be grabbed from https://spark.apache.org. There are many modules built atop Spark, for instance, GraphX for graph processing, Spark Streaming for processing real-time data streams, and MLlib for machine learning library featuring classification, regression, collaborative filtering, clustering, dimensionality reduction, and optimization. Spark's MLlib can use a Hadoop-based data source, for example, Hadoop Distributed File System (HDFS) or HBase, as well as local files. The supported data types include the following: Local vector is stored on a single machine. Dense vectors are presented as an array of double-typed values, for example, (2.0, 0.0, 1.0, 0.0); while sparse vector is presented by the size of the vector, an array of indices, and an array of values, for example, [4, (0, 2), (2.0, 1.0)]. Labeled point is used for supervised learning algorithms and consists of a local vector labeled with a double-typed class values. Label can be class index, binary outcome, or a list of multiple class indices (multiclass classification). For example, a labeled dense vector is presented as [1.0, (2.0, 0.0, 1.0, 0.0)]. Local matrix stores a dense matrix on a single machine. It is defined by matrix dimensions and a single double-array arranged in a column-major order. Distributed matrix operates on data stored in Spark's Resilient Distributed Dataset (RDD), which represents a collection of elements that can be operated on in parallel. There are three presentations: row matrix, where each row is a local vector that can be stored on a single machine, row indices are meaningless; and indexed row matrix, which is similar to row matrix, but the row indices are meaningful, that is, rows can be identified and joins can be executed; and coordinate matrix, which is used when a row cannot be stored on a single machine and the matrix is very sparse. Spark's MLlib API library provides interfaces to various learning algorithms and utilities as outlined in the following list: org.apache.spark.mllib.classification: These are binary and multiclass classification algorithms, including linear SVMs, logistic regression, decision trees, and naive Bayes org.apache.spark.mllib.clustering: These are k-means clustering org.apache.spark.mllib.linalg: These are data presentations, including dense vectors, sparse vectors, and matrices org.apache.spark.mllib.optimization: These are the various optimization algorithms used as low-level primitives in MLlib, including gradient descent, stochastic gradient descent, update schemes for distributed SGD, and limited-memory BFGS org.apache.spark.mllib.recommendation: These are model-based collaborative filtering implemented with alternating least squares matrix factorization org.apache.spark.mllib.regression: These are regression learning algorithms, such as linear least squares, decision trees, Lasso, and Ridge regression org.apache.spark.mllib.stat: These are statistical functions for samples in sparse or dense vector format to compute the mean, variance, minimum, maximum, counts, and nonzero counts org.apache.spark.mllib.tree: This implements classification and regression decision tree-learning algorithms org.apache.spark.mllib.util: These are a collection of methods to load, save, preprocess, generate, and validate the data Deeplearning4j Deeplearning4j, or DL4J, is a deep-learning library written in Java. It features a distributed as well as a single-machinedeep-learning framework that includes and supports various neural network structures such as feedforward neural networks, RBM, convolutional neural nets, deep belief networks, autoencoders, and others. DL4J can solve distinct problems, such as identifying faces, voices, spam or e-commerce fraud. Deeplearning4j is also distributed under Apache 2.0 license and can be downloaded from http://deeplearning4j.org. The library is organized as follows: org.deeplearning4j.base: These are loading classes org.deeplearning4j.berkeley: These are math utility methods org.deeplearning4j.clustering: This is the implementation of k-means clustering org.deeplearning4j.datasets: This is dataset manipulation, including import, creation, iterating, and so on org.deeplearning4j.distributions: These are utility methods for distributions org.deeplearning4j.eval: These are evaluation classes, including the confusion matrix org.deeplearning4j.exceptions: This implements exception handlers org.deeplearning4j.models: These are supervised learning algorithms, including deep belief network, stacked autoencoder, stacked denoising autoencoder, and RBM org.deeplearning4j.nn: These are the implementation of components and algorithms based on neural networks, such as neural network, multi-layer network, convolutional multi-layer network, and so on org.deeplearning4j.optimize: These are neural net optimization algorithms, including back propagation, multi-layer optimization, output layer optimization, and so on org.deeplearning4j.plot: These are various methods for rendering data org.deeplearning4j.rng: This is a random data generator org.deeplearning4j.util: These are helper and utility methods MALLET Machine Learning for Language Toolkit (MALLET), is a large library of natural language processing algorithms and utilities. It can be used in a variety of tasks such as document classification, document clustering, information extraction, and topic modeling. It features command-line interface as well as Java API for several algorithms such as naive Bayes, HMM, Latent Dirichlet topic models, logistic regression, and conditional random fields. MALLET is available under Common Public License 1.0, which means that you can even use it in commercial applications. It can be downloaded from http://mallet.cs.umass.edu. MALLET instance is represented by name, label, data, and source. However, there are two methods to import data into the MALLET format, as shown in the following list: Instance per file: Each file, that is, document, corresponds to an instance and MALLET accepts the directory name for the input. Instance per line: Each line corresponds to an instance, where the following format is assumed: the instance_name label token. Data will be a feature vector, consisting of distinct words that appear as tokens and their occurrence count. The library comprises the following packages: cc.mallet.classify: These are algorithms for training and classifying instances, including AdaBoost, bagging, C4.5, as well as other decision tree models, multivariate logistic regression, naive Bayes, and Winnow2. cc.mallet.cluster: These are unsupervised clustering algorithms, including greedy agglomerative, hill climbing, k-best, and k-means clustering. cc.mallet.extract: This implements tokenizers, document extractors, document viewers, cleaners, and so on. cc.mallet.fst: This implements sequence models, including conditional random fields, HMM, maximum entropy Markov models, and corresponding algorithms and evaluators. cc.mallet.grmm: This implements graphical models and factor graphs such as inference algorithms, learning, and testing. For example, loopy belief propagation, Gibbs sampling, and so on. cc.mallet.optimize: These are optimization algorithms for finding the maximum of a function, such as gradient ascent, limited-memory BFGS, stochastic meta ascent, and so on. cc.mallet.pipe: These are methods as pipelines to process data into MALLET instances. cc.mallet.topics: These are topics modeling algorithms, such as Latent Dirichlet allocation, four-level pachinko allocation, hierarchical PAM, DMRT, and so on. cc.mallet.types: This implements fundamental data types such as dataset, feature vector, instance, and label. cc.mallet.util: These are miscellaneous utility functions such as command-line processing, search, math, test, and so on. To design, build, and deploy your own machine learning applications by leveraging key Java machine learning libraries, check out this book Machine learning in Java, published by Packt Publishing. 5 JavaScript machine learning libraries you need to know A non programmer’s guide to learning Machine learning Why use JavaScript for machine learning?  
Read more
  • 0
  • 0
  • 20061

article-image-brief-history-minecraft-modding
Aaron Mills
03 Jun 2015
7 min read
Save for later

A Brief History of Minecraft Modding

Aaron Mills
03 Jun 2015
7 min read
Minecraft modding has been around since nearly the beginning. During that time it has gone through several transformations or “eras." The early days and early mods looked very different from today. I first became involved in the community during Mid-Beta, so everything that happened before then is second hand knowledge. A great deal has been lost to the sands of time, but the important stops along the way are remembered, as we shall explore. Minecraft has gone through several development stages over the years. Interestingly, these stages also correspond to the various “eras” of Minecraft Modding. Minecraft Survival was first experienced as Survival Test during Classic, then again in the Indev stage, which gave way to Infdev, then to Alpha and Beta before finally reaching Release. But before all that was Classic. Classic was released in May of 2009 and development continued into September of that year. Classic saw the introduction of Survival and Multiplayer. During this period of Minecraft’s history, modding was in its infancy. On the one hand, Server modding thrived during this stage with several different Server mods available. (These mods were the predecessors to Bukkit, which we will cover later.) Generally, the purpose of these mods was to give server admins more tools for maintaining their servers. On the other hand, however, Client side mods, ones that add new content, didn’t really start appearing until the Alpha stage. Alpha was released in late June of 2010, and it would continue for the rest of the year. Prior to Alpha, came Indev and Infdev, but there isn’t much evidence of any mods during that time period, possibly because of the lack of Multiplayer in Indev and Infdev. Alpha brought the return of Multiplayer, and during this time Minecraft began to see its first simple Client mods. Initially it was just simple modification of existing content: adding support for Higher Resolution textures, new arrow types, bug fixes, compass modifications, etc. The mods were simple and small. This began to change, though, beginning with the creation of the Minecraft Coder Pack, which was later renamed the Mod Coder Pack, commonly known as MCP. (One of the primary creators of MCP, Michael “Searge” Stoyke, now actually works for Mojang.) MCP saw its first release for Alpha 1.1.2_01 sometime in mid 2010. Despite being easily decompiled, Minecraft code was also obfuscated. Obfuscation is when you take all the meaningful names and words in the code and replace it with non-human readable nonsense. The computer can still make sense of it just fine, but humans have a hard time. MCP resolved this limitation by applying meaningful names to the code, making modding significantly easier than ever before. At the same time, but developing completely independently, was the server mod hMod, which gave some simple but absolutely necessary tools to server admins. However, hMod was in trouble as the main dev was MIA. This situation eventually led to the creation of Bukkit, a server mod designed from the ground up to support “plugins” and do everything that hMod couldn’t do. Bukkit was created by a group of people who were also eventually hired by Mojang: Nathan 'Dinnerbone' Adams, Erik 'Grum' Broes, Warren 'EvilSeph' Loo, and Nathan 'Tahg' Gilbert. Bukkit went on to become possibly the most popular Minecraft mod ever created. Many in fact believe its existence is largely responsible for the popularity of online Minecraft servers. However, it will remain largely incompatible with client side mods for some time. Not to be left behind, the client saw another major development late in the year: Risugami’s ModLoader. ModLoader was transformational. Prior to the existence of ModLoader, if you wanted to use two mods, you would have to manually merge the code, line by line, yourself. There were many common tasks that couldn’t be done without editing Minecraft’s base code, things such as adding new blocks and items. ModLoader changed that by creating a framework where simple mods could hook into ModLoader code to perform common tasks that previously required base edits. It was simple, and it would never really expand beyond its original scope. Still, it led modding into a new era. Minecraft Beta, what many call the “Golden Age” of modding, was released just before Christmas in 2010 and would continue through 2011. Beta saw the rise of many familiar mods that are still recognized today, including my own mod, Railcraft. Also IndustrialCraft, Buildcraft, Redpower, and Better than Wolves all saw their start during this period. These were major mods that added many new blocks and features to Minecraft. Additionally, the massive Aether mod, which recently received a modern reboot, was also released during Beta. These mods and more redefined the meaning of “Minecraft Mods”. They existed on a completely new scale, sometimes completely changing the game. But there were still flaws. Mods were still painful to create and painful to use. You couldn’t use IndustrialCraft and Buildcraft at the same time; they just edited too many of the same base files. ModLoader only covered the most common base edits, barely touching the code, and not enough for a major mod. Additionally, to use a mod, you still had to manually insert code into the Minecraft jar, a task that turned many players off of modding. Seeing that their mods couldn’t be used together, the creators of several major mods launched a new project. They would call it Minecraft Forge. Started by Eloraam of Redpower and SpaceToad of Buildcraft, it would see rapid adoption by many of the major mods of the time. Forge built on top of ModLoader, greatly expanding the number of base hooks and allowing many more mods to work together than was previously possible. This ushered in the true “Golden Age” of modding, which would continue from Beta and into Release. Minecraft 1.0 was released in November of 2011, heralding Minecraft’s “Official” release. Around the same time, client modding was undergoing a shift. Many of the most prominent developers were moving on to other things, including the entire Forge team. For the most part, their mods would survive without them, but some would not. Redpower, for example ceased all development in late 2012. Eloraam, SpaceToad, and Flowerchild would hand the reigns of Forge off to LexManos, a relatively unknown name at the time. The “Golden Age” was at an end, but it was replaced by an explosion of new mods and modding was becoming even more popular than ever. The new Forge team, consisting mainly of LexManos and cpw, would bring many new innovations to modding. Eventually they even developed a replacement for Risugami’s ModLoader, naming it ForgeModLoader and incorporating it into Forge. Users would no longer be required to muck around with Minecraft’s internals to install mods. Innovation has continued to the present day, and mods for Minecraft have become too numerous to count. However, the picture for server mods hasn’t been so rosy. Bukkit, the long dominant server mod, suffered a killing blow in 2014. Licensing conflicts developed between the original creators and maintainers, largely revolving around the who “owned” the project after the primary maintainers resigned. Ultimately, one of the most prolific maintainers used a technicality to invalidate the rights of the project to use his code, effectively killing the entire project. A replacement has yet to develop, leaving the server community limping along on increasingly outdated code. But one shouldn’t be too concerned about the future. There have been challenges in the past, but nearly every time a project died, it was soon replaced by something even better. Minecraft has one of the largest, most vibrant, and most mainstream modding communities ever to exist. It’s had a long and varied history, and this has been just a brief glimpse into that heritage. There are many more events, both large and small, that have helped shape the community. May the future of Minecraft continue to be as interesting. About the Author Aaron Mills was born in 1983 and lives in the Pacific Northwest, which is a land rich in lore, trees, and rain. He has a Bachelor's Degree in Computer Science and studied at Washington State University Vancouver. He is best known for his work on the Minecraft Mod, Railcraft, but has also contributed significantly to the Minecraft Mods of Forestry and Buildcraft as well some contributions to the Minecraft Forge project.
Read more
  • 0
  • 0
  • 19780