Big Data | 50 articles | Tech News, Tutorials & Expert Insights

06 Nov 2017

6 min read

NewSQL: What the hype is all about

06 Nov 2017

First, there was data. Data became database. Then came SQL. Next came NoSQL. And now comes NewSQL. NewSQL Origins For decades, relational database or SQL was the reigning data management standard in enterprises all over the world. With the advent of Big Data and cloud-based storage rose the need for a faster, more flexible and scalable data management system, which didn’t necessarily comply with the SQL standards of ACID compliance. This was popularly dubbed as NoSQL, and databases like MongoDB, Neo4j, and others gained prominence in no time. We can attribute the emergence and eventual adoption of NoSQL databases to a couple of very important factors. The high costs and lack of flexibility of the traditional relational databases drove many SQL users away. Also, NoSQL databases are mostly open source, and their enterprise versions are comparatively cheaper too. They are schema-less meaning they can be used to manage unstructured data effectively. In addition, they can scale well horizontally - i.e. you could add more machines to increase computing power and use it to handle high volumes of data. All these features of NoSQL come with an important tradeoff, however - these systems can’t simultaneously ensure total consistency. Of late, there has been a rise in another type of database systems, with the aim to combine ‘the best of both the worlds’. Popularly dubbed as ‘NewSQL’, this system promises to combine the relational data model of SQL and the scalability and speed of NoSQL. NewSQL - The dark horse in the databases race NewSQL is ‘SQL on Steroids’, say many. This is mainly because all NewSQL systems start with the relational data model and the SQL query language, but also incorporate the features that have led to the rise of NoSQL - addressing the issues of scalability, flexibility, and high performance. They offer the assurance of ACID transactions like in the relational models. However, what makes them really unique is that they allow the horizontal scaling functionality of NoSQL, and can process large volumes of data with high performance and reliability. This is why businesses really like the concept of NewSQL - the performance of NoSQL and the reliability and consistency of the SQL model, all packed in one. To understand what the hype surrounding NewSQL is all about, it’s worth comparing NewSQL database systems with the traditional SQL and NoSQL database systems, and see where they stand out: Characteristic Relational (SQL) NoSQL NewSQL ACID compliance Yes No Yes OLTP/OLAP support Yes No Yes Rigid Schema Structure Yes No In some cases Support for unstructured data No Yes In some cases Performance with large data Moderate Fast Very fast Performance overhead Huge Moderate Minimal Support from Community Very high High Low As we can see from the table above, NewSQL really comes through as the best when you’re dealing with larger datasets with a desire to lower performance overheads. To give you a practical example, consider an organization that has to work with a large number of short transactions, access a limited amount of data, but executes those queries repeatedly. For such organizations, a NewSQL database system would be a perfect fit. These features are leading to the gradual growth of NewSQL systems. However, it will take some time for more industries to adopt them. Not all NewSQL databases are created equal Today, one has a host of NewSQL solutions to choose from. Some popular solutions are Clustrix, MemSQL, VoltDB and CockroachDB. Cloud Spanner, the latest NewSQL offering by Google, became generally available in February 2017 - indicating Google’s interest in the NewSQL domain and the value a NewSQL database can offer to their existing cloud offerings. It is important to understand that there are significant differences among these various NewSQL solutions. As such you should choose a NewSQL solution carefully after evaluating your organization’s data requirements and problems. As this article on Dataconomy points out, while some databases handle transactional workloads well, they do not offer the benefit of native clustering - SAP HANA is one such example. NuoDB focuses on cloud deployments, but its overall throughput is found to be rather sub-par. MemSQL is a suitable choice when it comes to clustered analytics but falls short when it comes to consistency. Thus, the choice of the database purely depends on the task you want to do, and what trade-offs you are ready to allow without letting it affect your workflow too much. DBAs and Programmers in the NewSQL world Regardless of which database system an enterprise adopts, the role of DBAs will continue to be important going forward. Core database administration and maintenance tasks such as backup, recovery, replication, etc. will need to be taken care of. The major challenge for the NewSQL DBAs will be in choosing and then customizing the right database solution that fits the organizational requirements. Some degree of capacity planning and overall database administration skills might also have to be recalibrated. Likewise, NewSQL database programmers may find themselves dealing with data manipulation and querying tasks similar to those faced while working with traditional database systems. But NewSQL programmers will be doing these tasks at a much larger, or shall we say, at a more ‘distributed’ scale. In conclusion When it comes to solving a particular problem related to data management, it’s often said that 80% of the solution comes down to selecting the right tool, and 20% is about understanding the problem at hand! In order to choose the right database system for your organization, you must ask yourself these two questions: What is the nature of the data you will work with? What are you willing to trade-off? In other words, how important are factors such as the scalability and performance of the database system? For example, if you primarily work with mostly transactional data with a priority on high performance and high scalability, then NewSQL databases might fit your bill just perfectly. If you’re going to work with volatile data, NewSQL might help you there as well, however, there are better NoSQL solutions to tackle your data problem. As we have seen earlier, NewSQL databases have been designed to combine the advantages and power of both relational and NoSQL systems. It is important to know that NewSQL databases are not designed to replace either NoSQL or SQL relational models. They are rather intentionally-built alternatives for data processing, which mask the flaws and shortcomings of both relational and nonrelational database systems. The ultimate goal of NewSQL is to deliver a high performance, highly available solution to handle modern data, without compromising on data consistency and high-speed transaction capabilities.

0
0
26461

Savia Lobo

26 Oct 2017

6 min read

Hyperledger: The Enterprise-ready Blockchain

Savia Lobo

26 Oct 2017

6 min read

As one of the most widely discussed phenomena across the global media, Blockchain has certainly grown from just a hype to becoming a mainstream reality. Leading industry experts from finance, supply chain, and IoT are collaborating to make Blockchain available for commercial adoption. But while Blockchain is being projected as the future of digital transactions, it still suffers from two major limitations: carrying out private transactions and scalability. As such, a pressing need to develop a Blockchain-based distributed ledger to overcome these problems was widely felt. Enter Hyperledger Founded by Linux in 2015, Hyperledger aims at providing enterprises a platform to build robust blockchain applications for their businesses and to create open-source enterprise-grade frameworks to carry out secure business transactions. It is a fulcrum, which includes leading industries and software developers working collaboratively for building blockchain frameworks that can further be used to deploy blockchain applications for industries. With leading industry experts such as IBM, Intel, Accenture, SAP, among others collaborating with the Hyperledger community, and with the recent addition of BTS, Oracle, and Patientory Foundation, the community is gaining a lot of traction. No wonder, Brian Behlendorf, Executive Director at Hyperledger, says, “Growth and interest in Hyperledger remain high in 2017”. There are a total of 8 projects: five are frameworks (Sawtooth, Fabric, Burrow, Iroha, and Indy), and the other three are tools (Composer, Cello, and Explorer) supporting those frameworks. Each framework provides a different approach in building desired blockchain applications. Hyperledger Fabric, the community’s first framework, is contributed by IBM. It hosts smart contracts using Chaincode, an interface written in Go or Java, which contains the business logic of the ledger. Hyperledger Sawtooth, developed by Intel offers a modular blockchain architecture. It consists of Proof of Elapsed Time (PoET), a consensus algorithm developed by Intel for high efficiency among distributed ledgers. Hyperledger Burrow, a joint proposal by Intel and Monax, is a permissioned smart contract machine. It executes the smart contract code following the Ethereum specification with an engine, a strong audit trail, and a consensus mechanism. Apart from these already launched frameworks, two more - namely Indy and Iroha, are still in the incubation phase. The Hyperledger community is also building supporting tools such as Composer which is already launched in the market and Cello and Explorer which are awaiting unveiling. [box type="shadow" align="" class="" width=""]Although a plethora of Hyperledger tools and frameworks are available, in the rest of the article we take Hyperledger Fabric - one of the most popular and trending frameworks - for the purpose of demonstrating how Hyperledger is being used by businesses.[/box] Why should businesses use Hyperledger? In order to lock down a framework upon which Blockchain apps can be built, several key aspects are worth considering. Some of the most important ones among them are portability, security, reliability, interoperability, and user-friendliness. Hyperledger as a platform offers all of the above features for building cross-platform and production-ready applications for businesses. Let’s take a simple example here to see how Hyperledger works for businesses. Consider a restaurant business. A restaurant owner buys vegetables from a wholesale shop at a much lower cost than in the market. The shopkeeper creates a network wherein other buyers cannot see the cost at which vegetables are sold to a buyer. Similarly, the restaurant owner can view only his transaction with the shopkeeper. For the vegetables to reach the restaurant, they must pass through numerous stages such as transport, delivery, and so on. The restaurant owner can track the delivery of his vegetables at each stage and so can the shopkeeper. The transport and the delivery organizations, however, won’t be able to see the transaction details. This means that the shopkeeper can establish a confidential network within a private network of other stakeholders. This type of a network can be set up using Hyperledger Fabric. Let’s break down the above example into some of the reasons to consider incorporating Hyperledger for your business networks: With Hyperledger you get performance, scalability, and multiple levels of trust. You get data on a need-to-know basis - Only the parties in the network that need the data get to know about it. Backed by bigshots like Intel and IBM, Hyperledger strives to offer a strong standard for Blockchain code which in turn provides better functionality at increased speeds. Furthermore, with the recent release of Fabric v1.0, businesses can create out-of-the-box blockchain solutions on its highly elastic and extensible architecture further eased by using Hyperledger Composer. The Composer aids businesses in creating smart contracts and blockchain applications without having to know the underlying complex intricacies of the blockchain network. It is a great fit for real-world enterprise usage, built with collaborative efforts from leading industry experts. Although Ethereum is used by many businesses, some of the reasons why Hyperledger could be a better enterprise fit are: While Ethereum is a public Blockchain, Hyperledger is a private blockchain. This means enterprises within the network know who is present on the peer nodes, unlike Ethereum. Hyperledger is a permissioned network i.e., it has the ability to grant permission on who can participate in the consensus mechanism of the Blockchain network. Ethereum, on the other hand, is permissionless. Hyperledger has no built-in cryptocurrency. Ethereum, on the other hand, has a built-in cryptocurrency, called Ether. Many applications don’t need a cryptocurrency to function, and using Ethereum there can be a disadvantage. Hyperledger gives you the flexibility of choosing a programming language such as Java or Go, for preparing smart contracts. Ethereum, on the other hand, uses Solidity which is a lot less common in use. Hyperledger is highly scalable — unlike traditional Blockchain and Ethereum — with minimal performance losses. “Since Hyperledger Fabric was designed to meet key requirements for permissioned blockchains with transaction privacy and configurable policies, we’ve been able to build solutions quickly and flexibly. ” - Mohan Venkataraman, CTO, IT People Corporation. Future of Hyperledger The Hyperledger community is expanding rapidly with many industries collaborating and offering their capabilities in building cross-industry blockchain applications. Hyperledger has found adoption within business networks in varied industries such as healthcare, finance, and supply chain to build state-of-the-art blockchain applications which assure privacy and decentralized permissioned networks. It is shaping up to be a technology which can revolutionize the way businesses deal with different access control within a consortium, with an armor of enhanced security measures. With the continuous developments in these frameworks, smarter, faster, and more secure business transactions will soon be a reality. Besides, we can expect to see Hyperledger on the cloud with IBM’s plans to extend Blockchain technologies onto its cloud. Add to that the exciting prospect of blending aspects of Artificial Intelligence with Hyperledger, transactions look more advanced, tamper-proof, and secure than ever before.

0
0
6233

Ashwin Nair

24 Oct 2017

8 min read

Will Ethereum eclipse Bitcoin?

Ashwin Nair

24 Oct 2017

8 min read

Unless you have been living under a rock, you have most likely heard about Bitcoin, the world's most popular cryptocurrency that is growing by leaps and bounds. In fact, recently, Bitcoin broke the threshold of $6000 and is now priced at an all-time high. Bitcoin is not alone in this race as another cryptocurrency named Ethereum is hot on its heels. Despite being only three years old, Ethereum is quickly emerging as a popular choice especially among enterprise users. Ethereum’s YTD price growth has been more than a whopping 3000%. In terms of market cap as well Ethereum has shown a significant increase. Its overall share of the 'total cryptocurrency market' rose from 5% at the beginning of the year to 30% YTD. In absolute terms, today it stands at around $28 Billion. On the other hand, Bitcoin’s market cap as a percentage of the market has shrunk from 85% at the start of the year to 55% and is valued at around $90 Billion. Bitcoin played a huge role in bringing Ethereum into existence. The co-creator and inventor of Ethereum, Vitalik Buterin, was only 19 when his father introduced him to bitcoin and by extension, to the fascinating world of cryptocurrency. In a span of 3 years, Vitalik had written several blogs on the topic and also co-founded the Bitcoin Magazine in 2011. Though Bitcoin served as an excellent tool for money transaction eliminating the need for banks, fees, or third party, its scripting language had limitations. This led to Vitalik, along with other developers, to found Ethereum - A platform that aimed to extend beyond Bitcoin’s scope and make internet decentralized. How Ethereum differs from the reigning cryptocurrency - Bitcoin Both Bitcoin and Ethereum are built on top of Blockchain technology allowing them to build a decentralized public network. However, Ethereum’s capability extends beyond being a cryptocurrency and differs from Bitcoin substantially in terms of scope and potential. Exploiting the full spectrum blockchain platform Bitcoin leverages Blockchain's distributed ledger technology to perform secured peer-to-peer cash transactions. It thus disrupted traditional financial transaction instruments such as PayPal. Meanwhile, Ethereum aims to offer much more than digital currency by helping developers build and deploy any kind of decentralized applications on top of Blockchain. The following are some Ethereum based features and applications that make it superior to bitcoin. DApps A decentralized app or DApp refers to a program running on the internet through a network but is not under the control of any single entity. A white paper on DApp highlights the four conditions that need to be satisfied to call an application a DApp: It must be completely open-source Data and records of operation must be cryptographically stored It should utilize a cryptographic token It must generate tokens The whitepaper also goes on to suggest that DApps are the future: “decentralized applications will someday surpass the world’s largest software corporations in utility, user-base, and network valuation due to their superior incentivization structure, flexibility, transparency, resiliency, and distributed nature.” Smart Contracts and EVM Another feature that Ethereum boasts over Bitcoin is a smart contract. A smart contract works like a traditional contract. You can use it to perform a task or transfer money in return for any asset or task in an efficient manner without needing interference from a middleman. Though Bitcoin is fast, secure, and saves cost it has limitations in terms of the ability to run operations. Ethereum solves this problem by allowing operations to work as a contract by converting them to pieces of code and have them supervised by a network of computers. A tool that helps Ethereum developers build and experiment with different contracts is Ethereum Virtual Machine. It acts as a testing environment to build blockchain operations and is isolated from the main network. Thus, it gives developers a perfect platform to build and test smart as well as robust contracts across different industries. DAOs One can also create Decentralized Autonomous Organizations (DAO) using Ethereum. DAO eliminates the need for human managerial involvement. The organization runs through smart contracts that convert rules, core tasks and structure of the organization to codes monitored by a fault-tolerant network. An example of DAO is Slock.it, a DAO version of Airbnb. Performance An important factor for cryptocurrency transaction is the amount of time it takes to finalize the transaction. This is called as Block Time. In terms of performance, the Bitcoin network takes 10 minutes to make a transaction whereas Ethereum is much more efficient and boasts a block time of just 14-15 seconds. Development Ethereum’s programming language Solidity is based on JavaScript. This is great for web developers who want to use their knowledge of JavaScript to build cool DApps and extend the Ethereum platform. Moreover, Ethereum is Turing complete, meaning it can compute anything that is computable provided enough resources are available. Bitcoin, on the other hand, is based on C++ which comparatively is not a popular choice among the new generation of app developers. Community and Vision One can say Bitcoin works as a DAO with no involvement of individuals in managing the cryptocurrency and is completely decentralized and owned by the community. Satoshi Nakamoto, who prefers to stay behind the curtains, is the only name that one comes across when it comes to relating an individual with Bitcoin. The community, therefore, lacks a figurehead when it comes to seeking future directions. Meanwhile, Vitalik Buterin is hugely popular amongst Ethereum enthusiasts and is very much involved in designing the future roadmap with other co-founders. Cryptocurrency Supply Similar to Bitcoin, Ethereum has Ether which works as a digital asset that fuels the network and transactions performed on the platform. Bitcoin has a fixed supply cap of around 21 million coins. It’s going to take more than 100 years to mine the last Bitcoin after which Bitcoin would behave as a deflationary cryptocurrency. Ethereum, on the other hand, has no fixed supply cap but has restricted its annual supply to 18 million Ethers. With no upper cap on the number of Ether that can be mined, Ethereum behaves as an inflationary currency and may lose value with time. However, the Ethereum community is now planning to move from proof-of-work to proof-of-stake model which should limit the number of ethers being mined and also offer benefits such as energy efficiency and security. Some real-world applications using Ethereum The Decentralized applications’ growth has been on the rise with people starting to recognize the value offered by Blockchain and decentralization such as security, immutability, tamper-proofing, and much more. While Bitcoin uses blockchain purely as a list of transactions, Ethereum manages to transfer value and information through its platform. Thus, it allows for immense possibilities when it comes to building different DApps across a wide range of industries. The financial domain is obviously where Ethereum is finding a lot of traction. Projects such as Branche - a Decentralized Consumer Microcredit and Financial Services and Augur, a decentralized prediction market that has raised more than $ 5 million are some prominent examples. But financial applications are only the tip of the iceberg when it comes to possibilities that Ethereum offers and potential it holds when it comes disrupting industries across various sectors. Some other sectors where Ethereum is making its presence felt are: Firstblood is a decentralized eSports platform which has raised more than $5.5 million. It allows players to test their skills and bet using Ethereum while the tournaments are tracked on smart contracts and blockchain. Alice.Si a charitable trust that lets donors invest in noble causes knowing the fact that they only pay for causes where the charity makes an impact. Chainy is an Ethereum-based authentication and verification system that permanently stores records on blockchain using timestamping. Flippening is happening! If you haven’t heard of Flippening, it’s a term coined by cryptocurrency enthusiasts on Ethereum chances of beating Bitcoin to claim the number one spot to become the largest capitalized blockchain. Comparing Ethereum to Bitcoin may not be right as both serve different purposes. Bitcoin will continue to dominate cryptocurrency but as more industries adopt Ethereum to build Smart Contracts, DApps, or DAOs of their choice, its popularity is only going to grow, subsequently making Ether more valuable. Thus, the possibility of Ether displacing Bitcoin is strong. With the pace at which Ethereum is growing and the potential it holds in terms of unleashing Blockchain’s power to transform industries, it is definitely a question of when rather than if Flippening would happen!

0
0
6671

Aaron Lazar

23 Oct 2017

7 min read

"My Favorite Tools to Build a Blockchain App" - Ed, The Engineer

Aaron Lazar

23 Oct 2017

7 min read

Hey! It’s great seeing you here. I am Ed, the Engineer and today I’m going to open up my secret toolbox and share some great tools I use to build Blockchains. If you’re a Blockchain developer or a developer-to-be, you’ve come to the right place! If you are not one, maybe you should consider becoming one. “There are only 5,000 developers dedicated to writing software for cryptocurrencies, Bitcoin, and blockchain in general. And perhaps another 20,000 had dabbled with the technology, or have written front end applications that connect with the blockchain.” - William Mougayar, The Business Blockchain Decentralized apps or dapps, as they are fondly called, are serverless applications that can be run on the client-side, within a blockchain based distributed network. We’re going to learn what the best tools are to build dapps and over the next few minutes, we’ll take these tools apart one by one. For a better understanding of where they fit into our development cycle, we’ll group them up into stages - just like the buildings we build. So, shall we begin? Yes, we can!! ;) The Foundation: Platforms The first and foremost element for any structure to stand tall and strong is its foundation. The same goes for Blockchain apps. Here, in place of all the mortar and other things, we’ve got Decentralized and Public blockchains. There are several existing networks on the likes of Bitcoin, Ethereum or Hyperledger that can be used to build dapps. Ethereum and Bitcoin are both decentralized, public chains that are open source, while Hyperledger is private and also open source. Bitcoin may not be a good choice to build dapps on as it was originally designed for peer-to-peer transactions and not for building smart contracts. The Pillars of Concrete: Languages Now, once you’ve got your foundation in place, you need to start raising pillars that will act as the skeleton for your applications. How do we do this? Well, we’ve got two great languages specifically for building dapps. Solidity It’s an object-oriented language that you can use for writing smart contracts. The best part of Solidity is that you can use it across all platforms - making it the number one choice for many developers to use. It’s a lot like JavaScript and way more robust than other languages. Along with Solidity, you might want to use Solc, the compiler for Solidity. At the moment, Solidity is the language that’s getting the most support and has the best documentation. Serpent Before the dawn of Solidity, Serpent was the reigning language for building dapps. Something like how bricks replaced stone to build massive structures. Serpent though is still being used in many places to build dapps and it has great real-time garbage collection. The Transit Mixers: Frameworks After you choose your language to build dapps, you need a framework to simplify the mixing of concrete to build your pillars. I find these frameworks interesting: Embark This is a framework for Ethereum you can use to quicken development and to streamline the process by using tools or functionalities. It allows you to develop and deploy dapps easily, or even build a serverless HTML5 application that uses decentralized technology. It equips you with tools to create new smart contracts which can be made available in JavaScript code. Truffle Here is another great framework for Ethereum, which boasts of taking on the task of managing your contract artifacts for you. It includes support for the library that links complex Ethereum apps and provides custom deployments. The Contractors: Integrated Development Environments Maybe, you are not the kind that likes to build things from scratch. You just need a one-stop place where you can tell what kind of building you want and everything else just falls in place. Hire a contractor. If you’re looking for the complete package to build dapps, there are two great tools you can use, Ethereum Studio and Remix (Browser-Solidity). The IDE takes care of everything - right from emulating the live network to testing and deploying your dapps. Ethereum Studio This is an adapted version of Cloud9, built for Ethereum with some additional tools. It has a blockchain emulator called the sandbox, which is great for writing automated tests. Fair warning: You must pay for this tool as it’s not open source and you must use Azure Cloud to access it. Remix This can pretty much do the same things that Ethereum Studio can. You can run Remix from your local computer and allow it to communicate with an Ethereum node client that’s on your local machine. This will let you execute smart contracts while connected to your local blockchain. Remix is still under development during the time of writing this article. The Rebound Hammer: Testing tools Nothing goes live until it’s tried and tested. Just like the rebound hammer you may use to check the quality of concrete, we have a great tool that helps you test dapps. Blockchain Testnet For testing purposes, use the testnet, an alternative blockchain. Whether you want to create a new dapp using Ethereum or any other chain, I recommend that you use the related testnet, which ideally works as a substitute in place of the true blockchain that you will be using for the real dapp. Testnet coins are different from actual bitcoins, and do not hold any value, allowing you as a developer or tester to experiment, without needing to use real bitcoins or having to worry about breaking the primary bitcoin chain. The Wallpaper: dapp Browsers Once you’ve developed your dapp, it needs to look pretty for the consumers to use. Dapp browsers are mostly the User Interfaces for the Decentralized Web. Two popular tools that help you bring dapps to your browser are Mist and Metamask. Mist It is a popular browser for decentralized web apps. Just as Firefox or Chrome are for the Web 2.0, the Mist Browser will be for the decentralized Web 3.0. Ethereum developers would be able to use Mist not only to store Ether or send transactions but to also deploy smart contracts. Metamask With Metamask, you can comfortably run dapps in your browser without having to run a full Ethereum node. It includes a secure identity vault that provides a UI to manage your identities on various sites, as well as sign blockchain contracts. There! Now you can build a Blockchain! Now you have all the tools you need to make amazing and reliable dapps. I know you’re always hungry for more - this Github repo created by Christopher Allen has a great listing of tools and resources you can use to begin/improve your Blockchain development skills. If you’re one of those lazy-but-smart folks who want to get things done at the click of a mouse button, then BaaS or Blockchain as a Service is something you might be interested in. There are several big players in this market at the moment, on the likes of IBM, Azure, SAP and AWS. BaaS is basically for organizations and enterprises that need blockchain networks that are open, trusted and ready for business. If you go the BaaS way, let me warn you - you’re probably going to miss out on all the fun of building your very own blockchain from scratch. With so many banks and financial entities beginning to set up their blockchains for recording transactions and transfer of assets, and investors betting billions on distributed ledger-related startups, there are hardly a handful of developers out there, who have the required skills. This leaves you with a strong enough reason to develop great blockchains and sharpen your skills in the area. Our Building Blockchain Projects book should help you put some of these tools to use in building reliable and robust dapps. So what are you waiting for? Go grab it now and have fun building blockchains!

0
2
7753

article-image-top-4-chatbot-development-frameworks-developers

Sugandha Lahoti

20 Oct 2017

8 min read

Top 4 chatbot development frameworks for developers

Sugandha Lahoti

20 Oct 2017

8 min read

The rise of the bots is nigh! If you can imagine a situation involving a dialog, there is probably a chatbot for that. Just look at the chatbot market - text-based email/SMS bots, voice-based bots, bots for customer support, transaction-based bots, entertainment bots and many others. A large number of enterprises, from startups to established organizations, are seeking to invest in this sector. This has also led to an increase in the number of platforms used for chatbot building. These frameworks incorporate AI techniques along with natural language processing capabilities to assist developers in building and deploying chatbots. Let’s start with how a chatbot typically works before diving into some of the frameworks. Understand: The first step for any chatbot is to understand the user input. This is made possible using pattern matching and intent classification techniques. ‘Intents’ are the tasks that users might want to perform with a chatbot. Machine learning, NLP and speech recognition techniques are typically used to identify the intent of the message and extract named entities. Entities are the specific pieces of information extracted from the user’s response i.e. the content associated with an intent. Respond: After understanding, the next goal is to generate a response. This is based on the current input message and the context of the conversation. After specifying the intents and entities, a dialog flow is constructed. This is basically the replies/feedback expected from a chatbot. Learn: Chatbots use AI techniques such as natural language understanding and pattern recognition to store and distinguish between the context of the information provided, and elicit a suitable response for future replies. This is important because different requests might have different meanings depending on previous requests. Top chatbot development frameworks A bot development framework is a set of predefined classes, functions, and utilities that a developer can use to build chatbots easier and faster. They vary in the level of complexity, integration capabilities, and functionalities. Let us look at some of the development platforms utilized for chatbot building. API.AI API.AI, a code based framework with a simple web-based interface, allows users to build engaging voice and text-based conversational apps using a large number of libraries and SDKs including Android, iOS, Webkit HTML5, Node.js, and Python API. It also supports nearly 32 one-click platform integrations such as Google, Facebook Messenger, Twitter and Skype to name a few. API.AI makes use of an agent - a container that transforms natural language based user requests into actionable data. The software tries to find the intent behind a user’s reply and matches it to the default or the closest match. After intent matching, it executes the actions and responses the developer has defined for that intent. API.AI also makes use of entities. Once the intents and entities are specified, the bot is trained. API.AI’s training module efficiently tracks each user’s request and lets developers see how they are parsed and matched to an intent. It also allows for correction of any errors and change requests thus retraining the bot. API.AI streamlines the entire bot-creating process by helping developers provide domain-specific knowledge that is unique to a bot’s needs while working on speech recognition, intent and context management in the backend. Google has recently partnered with API.AI to help them build conversational tools like Apple’s Siri. Microsoft Bot Framework Microsoft Bot Framework allows building and deployment of chatbots across multiple platforms and services such as web, SMS, non-Microsoft platforms, Office 365, Skype etc. The Bot Framework includes two components - The Bot Builder and the Microsoft Cognitive Services. The Bot Builder comprises of two full-featured SDKs - for the.NET and the Node.js platforms along with an emulator for testing and debugging. There’s also a set of RESTful APIs for building code in other languages. The SDKs support features for simple and easy interactions between bots. They also have a large collection of prebuilt sample bots for the developer to choose from. The Microsoft Cognitive Services is a collection of intelligent APIs that simplify a variety of AI tasks such as allowing the system to understand and interpret the user's needs using natural language in just a few lines of code. These APIs allow integration to most modern languages and platforms and constantly improve, learn, and get smarter. Microsoft created the AI Inner Circle Partner Program to work hand in hand with industry to create AI solutions. Their only partner in the UK is ICS.AI who build conversational AI solutions for the UK's public sector. ICS are the first choice for many organisations due to their smart solutions that scale and serve to improve services for the general public. Developers can build bots in the Bot Builder SDK using C# or Node.js. They can then add AI capabilities with Cognitive Services. Finally, they can register the bots on the developer portal, connecting it to users across platforms such as Facebook and Microsoft Teams and also deploy it on the cloud like Microsoft Azure. For a step-by-step guide for chatbot building using Microsoft Bot Framework, you can refer to one of our books on the topic. Sabre Corporation, a customer service provider for travel agencies, have recently announced the development of an AI-powered chatbot that leverages Microsoft Bot Framework and Microsoft Cognitive Services. Watson Conversation IBM’s Watson Conversation helps build chatbot solutions that understand natural-language input and use machine learning to respond to customers in a way that simulates conversations between humans. It is built on a neural network of one million Wikipedia words. It offers deployment across a variety of platforms including mobile devices, messaging platforms, and robots. The platform is robust and secure as IBM allows users to opt out of data sharing. The IBM Watson Tone Analyzer service can help bots understand the tone of the user’s input for better management of the experience. The basic steps to create a chatbot using Watson Conversation are as follows. We first create a workspace - a place for configuring information to maintain separate intents, user examples, entities, and dialogues for each application. One workspace corresponds to one bot. Next, we create Intents. Watson Conversation makes use of multiple conditioned responses to distinguish between similar intents. For example, instead of building specific intents for locations of different places, it creates a general intent “location” and adds an entity to capture the response, like the “location- bedroom” - to the right, near the stairs, “location-kitchen”- to the left. The third step is entity establishment. This involves grouping entities that might trigger a similar response in the dialog. The dialog flow, thus generated after specifying the intents and entities, goes through testing followed by embedding this into an application. It is then connected with other services by using the conversation API. Staples, an office supply retailing firm, uses Watson Conversation in their “Easy Systems” to simplify the customer’s shopping experience. CXP Designer and Aspect NLU Aspect Customer Experience Platform is an application lifecycle management tool to build text and voice-based applications such as chatbots. It provides deployment options across multiple communication channels like text, voice, mobile web and social media networks. The Aspect CXP typically includes a CXP designer to build chatbots and the inbuilt Aspect NLU to provide advanced natural language capabilities. CXP designer works by creating dialog objects to provide a menu of options for frontend as well as backend. Menu items for the frontend are used to create intents and modules within those intents. The developer can then modify labels (of those intents and modules) manually or use the Aspect NLU to disambiguate similar questions for successful extraction of meaning and intent. The Aspect NLU includes tools for spelling correction, linguistic lexicons such as nouns, verbs etc. and options for detecting and extracting common data types such as date, time, numbers, etc. It also allows a developer to modify the meaning extraction based on how they want it if they want it! CXP designer also allows skipping of certain steps in chatbots. For instance, if the user has already provided their tracking id for a particular package, the chatbot will skip the prompt of asking them the tracking id again. With Aspect CXP, developers can create and deploy complex chatbots. Radisson Blu Edwardian, a hotel in London, has collaborated with Aspect software to build an SMS based, AI virtual host. Conclusion Another popular chatbot development platform worth mentioning is the Facebook messenger with over 100,000 monthly active bots, but without cross-platform deployment features. The above bot frameworks are typically used by developers to build chatbots from scratch and require some programming skills. However, there has been a rise in automated bot development tools of late. Some of these include Chatfuel and Motion AI and typically involve drag and drop functionalities. With such tools, beginners and non-programmers can create and deploy chatbots within few minutes. But, they lack the extended functionalities supported by typical code based frameworks such as the flexibility to store data, produce analytics or incorporate customized AI tasks. Every chatbot development system, whether framework or tool, serves a different purpose. Choosing the right one depends on the type of application to build, organizational needs, and the developer’s expertise.

0
0
12795

article-image-introducing-intelligent-apps-a-smarter-way-into-the-future

Amarabha Banerjee

19 Oct 2017

6 min read

Introducing Intelligent Apps

Amarabha Banerjee

19 Oct 2017

6 min read

We are a species obsessed with ‘intelligence’ since gaining consciousness. We have always been inventing ways to make our lives better through sheer imagination and application of our intelligence. Now, it comes as no surprise that we want our modern day creations to be smart as well - be it a web app or a mobile app. The first question that comes to mind then is what makes an application ‘intelligent’? A simple answer for budding developers is that intelligent apps are apps that can take intuitive decisions or provide customized recommendations/experience to their users based on insights drawn from data collected from their interaction with humans. This brings up a whole set of new questions: How can intelligent apps be implemented, what are the challenges, what are the primary application areas of these so-called Intelligent apps, and so on. Let’s start with the first question. How can intelligence be infused into an app? The answer has many layers just like an app does. The monumental growth in data science and its underlying data infrastructure has allowed machines to process, segregate and analyze huge volumes of data in limited time. Now, it looks set to enable machines to glean meaningful patterns and insights from the very same data. One such interesting example is predicting user behavior patterns. Like predicting what movies or food or brand of clothing the user might be interested in, what songs they might like to listen to at different times of their day and so on. These are, of course, on the simpler side of the spectrum of intelligent tasks that we would like our apps to perform. Many apps currently by Amazon, Google, Apple, and others are implementing and perfecting these tasks on a day-to-day basis. Complex tasks are a series of simple tasks performed in an intelligent manner. One such complex task would be the ability to perform facial recognition, speech recognition and then use it to perform relevant daily tasks, be it at home or in the workplace. This is where we enter the realm of science fiction where your mobile app would recognise your voice command while you are driving back home and sends automated instructions to different home appliances, like your microwave, AC, and your PC so that your food is served hot when you reach home, your room is set at just the right temperature and your PC has automatically opened the next project you would like to work on. All that happens while you enter your home keys-free thanks to a facial recognition software that can map your face and ID you with more than 90% accuracy, even in low lighting conditions. APIs like IBM Watson, AT&T Speech, Google Speech API, the Microsoft Face API and some others provide developers with tools to incorporate features such as those listed above, in their apps to create smarter apps. It sounds almost magical! But is it that simple? This brings us to the next question. What are some developmental challenges for an intelligent app? The challenges are different for both web and mobile apps. Challenges for intelligent web apps For web apps, choosing the right mix of algorithms and APIs that can implement your machine learning code into a working web app, is the primary challenge. plenty of Web APIs like IBM Watson, AT&T speech etc. are available to do this. But not all APIs can perform all the complex tasks we discussed earlier. Suppose you want an app that successfully performs both voice and speech recognition and then also performs reinforcement learning by learning from your interaction with it. You will have to use multiple APIs to achieve this. Their integration into a single app then becomes a key challenge. Here is why. Every API has its own data transfer protocols and backend integration requirements and challenges. Thus, our backend requirement increases significantly, both in terms of data persistence and dynamic data availability and security. Also, the fact that each of these smart apps would need customized user interface designs, poses a challenge to the front end developer. The challenge is to make a user interface so fluid and adaptive that it supports the different preferences of different smart apps. Clearly, putting together a smart web app is no child’s play. That’s why, perhaps, smart voice-controlled apps like Alexa are still merely working as assistants and providing only predefined solutions to you. Their ability to execute complex voice-based tasks and commands is fairly low, let alone perform any non-voice command based task. Challenges for intelligent mobile apps For intelligent mobile apps, the challenges are manifold. A key reason is network dependency for data transfer. Although the advent of 4G and 5G mobile networks has greatly improved mobile network speed, the availability of network and the data transfer speeds still pose a major challenge. This is due to the high volumes of data that intelligent mobile apps require to perform efficiently. To circumvent this limitation, vendors like Google are trying to implement smarter APIs in the mobile’s local storage. But this approach requires a huge increase in the mobile chip’s computation capabilities - something that’s not currently available. Maybe that’s why Google has also hinted at jumping into the chip manufacturing business if their computation needs are not met. Apart from these issues, running multiple intelligent apps at the same time would also require a significant increase in the battery life of mobile devices. Finally, comes the last question. What are some key applications of intelligent apps? We have explored some areas of application in the previous sections keeping our focus on just web and mobile apps. Broadly speaking, whatever makes our daily life easier, is ideally a potential application area for intelligent apps. From controlling the AC temperature automatically to controlling the oven and microwave remotely using the vacuum cleaner (of course the vacuum cleaner has to have robotic AI capabilities) to driving the car, everything falls in the domain of intelligent apps. The real questions for us are What can we achieve with our modern computation resources and our data handling capabilities? How can mobile computation capabilities and chip architecture be improved drastically so that we can have smart apps perform complex tasks faster and ease our daily workflow? Only the future holds the answer. We are rooting for the day when we will rise to become a smarter race by delegating lesser important yet intelligent tasks to our smarter systems by creating intelligent web and mobile apps efficiently and effectively. The culmination of these apps along with hardware driven AI systems could eventually lead to independent smart systems - a topic we will explore in the coming days.

0
0
7199

article-image-devops-for-big-data-success

Ashwin Nair

11 Oct 2017

5 min read

DevOps might be the key to your Big Data project success

Ashwin Nair

11 Oct 2017

5 min read

So, you probably believe in the power of Big Data and the potential it has to change the world. Your company might have already invested in or is planning to invest in a big data project. That’s great! But what if I were to tell you that only 15% of the business were successfully able to deploy their Big Data projects to production. That can’t be a good sign surely! Now, don’t just go freeing up your Big Data budget. Not yet. Big Data’s Big Challenges For all the hype around Big Data, research suggests that many organizations are failing to leverage its opportunities properly. A recent survey by NewVantage partners, for example, explored the challenges facing organizations currently running their own Big Data projects or trying to adopt them. Here’s what they had to say: “In spite of the successes, executives still see lingering cultural impediments as a barrier to realizing the full value and full business adoption of Big Data in the corporate world. 52.5% of executives report that organizational impediments prevent realization of broad business adoption of Big Data initiatives. Impediments include lack or organizational alignment, business and/or technology resistance, and lack of middle management adoption as the most common factors. 18% cite lack of a coherent data strategy.” Clearly, even some of the most successful organizations are struggling to get a handle on Big Data. Interestingly, it’s not so much about gaps in technology or even skills, but rather lack of culture and organizational alignment that’s making life difficult. This isn’t actually that surprising. The problem of managing the effects of technological change is one that goes far beyond Big Data - it’s impacting the modern workplace in just about every department, from how people work together to how you communicate and sell to customers. DevOps Distilled It’s out of this scenario that we’ve seen the irresistible rise of DevOps. DevOps, for the uninitiated, is an agile methodology that aims to improve the relationship between development and operations. It aims to ensure a fluid collaboration between teams; with a focus on automating and streamlining monotonous and repetitive tasks within a given development lifecycle, thus reducing friction and saving time. We can perhaps begin to see, then, that this approach - usually used in typical software development scenarios - might actually offer a solution to some of the problems faced when it comes to big data. A typical Big Data project Like a software development project, a Big Data project will have multiple different teams working on it in isolation. For example, a big data architect will look into the project requirements and design a strategy and roadmap for implementation, while the data storage and admin team will be dedicated to setting up a data cluster and provisioning infrastructure. Finally, you’ll probably then find data analysts who process, analyse and visualize data to gain insights. Depending on the scope and complexity of your project it is possible that more teams are brought in - say, data scientists are roped in to trains and build custom machine learning models. DevOps for Big Data: A match made in heaven Clearly, there are a lot of moving parts in a typical Big Data project - each role performing considerably complex tasks. By adopting DevOps, you’ll reduce any silos that exist between these roles, breaking down internal barriers and embedding Big Data within a cross-functional team. It’s also worth noting that this move doesn’t just give you a purely operational efficiency advantage - it also gives you much more control and oversight over strategy. By building a cross-functional team, rather than asking teams to collaborate across functions (sounds good in theory, but it always proves challenging), there is a much more acute sense of a shared vision or goal. Problems can be solved together, discussions can take place constantly and effectively. With the operational problems minimized, everyone can focus on the interesting stuff. By bringing DevOps thinking into big data, you also set the foundation for what’s called continuous analytics. Taking the principle of continuous integration, fundamental to effective DevOps practice, whereby code is integrated into a shared repository after every task or change to ensure complete alignment, continuous analytics streamlines the data science lifecycle by ensuring a fully integrated approach to analytics, where as much as possible is automated through algorithms. This takes away the boring stuff - once again ensuring that everyone within the project team can focus on what’s important. We’ve come a long way from Big Data being a buzzword - today, it’s the new normal. If you’ve got a lot of data to work with, to analyze and to understand, you better make sure you’ve the right environment setup to make the most from it. That means there’s no longer an excuse for Big Data projects to fail, and certainly no excuse not to get one up and running. If it takes DevOps to make Big Data work for businesses then it’s a MINDSET worth cultivating and running with.

0
0
5808

article-image-what-we-learned-oracle-openworld-2017

Amey Varangaonkar

06 Oct 2017

5 min read

What we learned from Oracle OpenWorld 2017

Amey Varangaonkar

06 Oct 2017

5 min read

“Amazon’s lead is over.” These famous words by the Oracle CTO Larry Ellison in the Oracle OpenWorld 2016 garnered a lot of attention, as Oracle promised their customers an extensive suite of cloud offerings, and offered a closer look at their second generation IaaS data centers. In the recently concluded OpenWorld 2017, Oracle continued on their quest to take on AWS and other major cloud vendors by unveiling a host of cloud-based products and services. Not just that, they have juiced these offerings up with Artificial Intelligence-based features, in line with all the buzz surrounding AI. Key highlights from the Oracle OpenWorld 2017 Autonomous Database Oracle announced a totally automated, self-driving database that would require no human intervention for managing or fine-tuning the database. Using machine learning and AI to eliminate human error, the new database guarantees 99.995% availability. While taking another shot at AWS, Ellison promised in his keynote that customers moving from Amazon’s Redshift to Oracle’s database can expect a 50% cost reduction. Likely to be named as Oracle 18c, this new database is expected to be shipped across the world by December 2017. Oracle Blockchain Cloud Service Oracle joined IBM in the race to dominate the Blockchain space by unveiling its new cloud-based Blockchain service. Built on top of the Hyperledger Fabric project, the service promises to transform the way business is done by offering secure, transparent and efficient transactions. Other enterprise-critical features such as provisioning, monitoring, backup and recovery are also some of the standard features which this service will offer to its customers. “There are not a lot of production-ready capabilities around Blockchain for the enterprise. There [hasn’t been] a fully end-to-end, distributed and secure blockchain as a service,” Amit Zavery, Senior VP at Oracle Cloud. It is also worth remembering that Oracle joined the Hyperledger consortium just two months ago, and the signs of them releasing their own service were there already. Improvements to Business Management Services The new features and enhancements introduced for the business management services were one of the key highlights of the OpenWorld 2017. These features now empower businesses to manage their customers better, and plan for the future with better organization of resources. Some important announcements in this area were: Adding AI capabilities to its cloud services - The Oracle Adaptive Intelligent Apps will now make use of the AI capabilities to improve services for any kind of business Developers can now create their own AI-powered Oracle applications, making use of deep learning Oracle introduced AI-powered chatbots for better customer and employee engagement New features such as enhanced user experience in the Oracle ERP cloud and improved recruiting in the HR cloud services were introduced Key Takeaways from Oracle OpenWorld 2017 With the announcements, Oracle have given a clear signal that they’re to be taken seriously. They’re already buoyed by a strong Q1 result which saw their revenue from cloud platforms hit $1.5 billion, indicating a growth of 51% as compared to Q1 2016, Here are some key takeaways from the OpenWorld 2017, which are underlined by the aforementioned announcements: Oracle undoubtedly see cloud as the future, and have placed a lot of focus on the performance of their cloud platform. They’re betting on the fact that their familiarity with the traditional enterprise workload will help them win a lot more customers - something Amazon cannot claim. Oracle are riding on the AI wave and are trying to make their products as autonomous as possible - to reduce human intervention and human error, to some extent. With enterprises looking to cut costs wherever possible, this could be a smart move to attract more customers. The autonomous database will require Oracle to automatically fine-tune, patch, and upgrade its database, without causing any downtime. It will be interesting to see if the database can live up to its promise of ‘99.995% availability’. Is the role of Oracle DBAs going to be at risk, due to the automation? While it is doubtful that they will be out of jobs, there is bound to be a significant shift in their day to day operations. It is speculated that the DBAs would require to spend less time on the traditional administration tasks such as fine-tuning, patching, upgrading, etc. and instead focus on efficient database design, setting data policies and securing the data. Cybersecurity has been a key theme in Ellison’s keynote and the OpenWorld 2017 in general. As enterprise Blockchain adoption grows, so does the need for a secure, efficient digital transaction system. Oracle seem to have identified this opportunity, and it will be interesting to see how they compete with the likes of IBM and SAP to gain major market share. Oracle’s CEO Mark Hurd has predicted that Oracle can win the cloud wars, overcoming the likes of Amazon, Microsoft and Google. Judging by the announcements in the OpenWorld 2017, it seems like they may have a plan in place to actually pull it off. You can watch highlights from the Oracle OpenWorld 2017 on demand here. Don’t forget to check out our highly popular book Oracle Business Intelligence Enterprise Edition 12c, your one-stop guide to building an effective Oracle BI 12c system.

0
0
3767

article-image-what-is-streaming-analytics-and-why-is-it-important

Amey Varangaonkar

05 Oct 2017

5 min read

Say hello to Streaming Analytics

Amey Varangaonkar

05 Oct 2017

5 min read

In this data-driven age, businesses want fast, accurate insights from their huge data repositories in the shortest time span — and in real time when possible. These insights are essential — they help businesses understand relevant trends, improve their existing processes, enhance customer satisfaction, improve their bottom line, and most importantly, build, and sustain their competitive advantage in the market. Doing all of this is quite an ask - one that is becoming increasingly difficult to achieve using just the traditional data processing systems where analytics is limited to the back-end. There is now a burning need for a newer kind of system where larger, more complex data can be processed and analyzed on the go. Enter: Streaming Analytics Streaming Analytics, also referred to as real-time event processing, is the processing and analysis of large streams of data in real-time. These streams are basically events that occur as a result of some action. Actions like a transaction or a system failure, or a trigger that changes the state of a system at any point in time. Even something as minor or granular as a click would then constitute as an event, depending upon the context. Consider this scenario - You are the CTO of an organization that deals with sensor data from wearables. Your organization would have to deal with terabytes of data coming in on a daily basis, from thousands of sensors. One of your biggest challenges as a CTO would be to implement a system that processes and analyzes the data from these sensors as it enters the system. Here’s where streaming analytics can help you by giving you the ability to derive insights from your data on the go. According to IBM, a streaming system demonstrates the following qualities: It can handle large volumes of data It can handle a variety of data and analyze it efficiently — be it structured or unstructured, and identifies relevant patterns accordingly It can process every event as it occurs unlike traditional analytics systems that rely on batch processing Why is Streaming Analytics important? The humongous volume of data that companies have to deal with today is almost unimaginable. Add to that the varied nature of data that these companies must handle, and the urgency with which value needs to be extracted from this data - it all makes for a pretty tricky proposition. In such scenarios, choosing a solution that integrates seamlessly with different data sources, is fine-tuned for performance, is fast, reliable, and most importantly one that is flexible to changes in technology, is critical. Streaming analytics offers all these features - thereby empowering organizations to gain that significant edge over their competition. Another significant argument in favour of streaming analytics is the speed at which one can derive insights from the data. Data in a real-time streaming system is processed and analyzed before it registers in a database. This is in stark contrast to analytics on traditional systems where information is gathered, stored, and then the analytics is performed. Thus, streaming analytics supports much faster decision-making than the traditional data analytics systems. Is Streaming Analytics right for my business? Not all organizations need streaming analytics, especially those that deal with static data or data that hardly change over longer intervals of time, or those that do not require real-time insights for decision-making. For instance, consider the HR unit of a call centre. It is sufficient and efficient to use a traditional analytics solution to analyze thousands of past employee records rather than run it through a streaming analytics system. On the other hand, the same call centre can find real value in implementing streaming analytics to something like a real-time customer log monitoring system. A system where customer interactions and context-sensitive information are processed on the go. This can help the organization find opportunities to provide unique customer experiences, improve their customer satisfaction score, alongside a whole host of other benefits. Streaming Analytics is slowly finding adoption in a variety of domains, where companies are looking to get that crucial competitive advantage - sensor data analytics, mobile analytics, business activity monitoring being some of them. With the rise of Internet of Things, data from the IoT devices is also increasing exponentially. Streaming analytics is the way to go here as well. In short, streaming analytics is ideal for businesses dealing with time-critical missions and those working with continuous streams of incoming data, where decision-making has to be instantaneous. Companies that obsess about real-time monitoring of their businesses will also find streaming analytics useful - just integrate your dashboards with your streaming analytics platform! What next? It is safe to say that with time, the amount of information businesses will manage is going to rise exponentially, and so will the nature of this information. As a result, it will get increasingly difficult to process volumes of unstructured data and gain insights from them using just the traditional analytics systems. Adopting streaming analytics into the business workflow will therefore become a necessity for many businesses. Apache Flink, Spark Streaming, Microsoft's Azure Stream Analytics, SQLstream Blaze, Oracle Stream Analytics and SAS Event Processing are all good places to begin your journey through the fleeting world of streaming analytics. You can browse through this list of learning resources from Packt to know more. Learning Apache Flink Learning Real Time processing with Spark Streaming Real Time Streaming using Apache Spark Streaming (video) Real Time Analytics with SAP Hana Real-Time Big Data Analytics

0
0
9746

Savia Lobo

29 Aug 2017

4 min read

How Blockchain can level up IoT Security

Savia Lobo

29 Aug 2017

4 min read

IoT contains hoard of sensors, vehicles and all devices that have embedded electronics which can communicate over the Internet. These IoT enabled devices generate tons of data every second. And with IoT Edge Analytics, these devices are getting much smarter - they can start or stop a request without any human intervention. 25 billion connected ”things” will be connected to the internet by 2020. - Gartner Research With so much data being generated by these devices, the question on everyone’s mind is: Will all this data be reliable and secure? When Brains meet Brawn: Blockchain for IoT Blockchain, an open distributed ledger, is highly secure and difficult to manipulate/corrupt by anyone connected over the network. It was initially designed for cryptocurrency based financial transactions. Bitcoin is a famous example which has Blockchain as its underlying technology. Blockchain has come a long way since then and can now be used to store anything of value. So why not save data in it? And this data will be secure just like every digital asset in a Blockchain is. Blockchain, decentralized and secured, is an ideal structure suited to form the underlying foundation for IoT data solutions. Current IoT devices and their data rely on the client service architecture. All devices are identified, authenticated, and connected via the cloud servers, which are capable of storing ample amount of data. But this requires huge infrastructure, which is all the more expensive. Blockchain not only provides an economical alternative but also since it works in a decentralized fashion it eliminates all single point of failures, creating a much secure and tougher network for IoT devices. This makes IoT more secure and reliable. Customers can therefore relax knowing their information is in safe hands. Today, Blockchain’s capabilities extend beyond processing financial transactions - It can now track billions of connected devices, process transactions and even co-ordinate between devices - a good fit for the IoT industry. Why Blockchain is perfect for IoT Inherent weak security features make IoT devices suspect. On the other hand, Blockchain with its tamper-proof ledger makes it hard to manipulate for malicious activities - thus, making it the right infrastructure for IoT solutions. Enhancing security through decentralization Blockchain makes it hard for intruders to intervene as it spans across a network of secure blocks. Change at a single location, therefore, does not affect the other blocks. The data or any value remains encrypted and is only visible to the person who has encrypted it using a private key. The cryptographic algorithms used in Blockchain technology ensure the IoT data remain private either for an individual organization or for the organizations connected in a network . Simplicity through autonomous 3rd-party-free transactions Blockchain technology is already a star in the finance sector thanks to the adoption of smart contracts, Bitcoin and other cryptocurrencies. Apart from providing a secured medium for financial transactions, it eliminates the need for third-party brokers such as banks to provide guarantee over peer-to-peer payment services. With Blockchain, IoT data can be treated in a similar manner, wherein smart contracts can be made between devices to exchange messages and data. This type of autonomy is possible because each node in the blockchain network can verify the validity of the transaction without relying on a centralized authority. Blockchain backed IoT solutions will thus enable trustworthy message sharing. Business partners can easily access and exchange confidential information within the IoT without a centralized management/regulatory authority. This means quicker transactions, lower costs and lesser opportunities for malicious intent such as data espionage. Blockchain's immutability for predicting IoT security vulnerabilities Blockchains maintain a history of all transactions made by smart devices connected within a particular network. This is possible because once you enter data in a Blockchain, it lives there forever in its immutable ledger. The possibilities for IoT solutions that leverage Blockchain’s immutability are limitless. Some obvious uses cases are more robust credit-scores and preventive health-care solutions that use data accumulated through wearables. For all the above reasons, we see significant Blockchain adoption by IoT based businesses in the near future.

0
0
7551

Amey Varangaonkar

28 Aug 2017

7 min read

Is Python edging R out in the data science wars?

Amey Varangaonkar

28 Aug 2017

7 min read

When it comes to the ‘lingua franca’ of data science, there seems to be a face-off between R and Python. R has long been established as the language of researchers and statisticians but Python has come up quickly as a bona-fide challenger, helping embed analytics as a necessity for businesses and other organizations in 2017. If a tech war does exist between the two languages, it’s a battle fought not so much on technical features but instead on the wider changes within modern business and technology. R is a language purpose-built for statistics, for performing accurate and intensive analysis. So, the fact that R is being challenged by Python — a language that is flexible, fast, and relatively easy to learn — suggests we are seeing a change in who’s actually doing data science, where they’re doing it, and what they’re trying to achieve. Python versus R — A Closer Look Let’s make a quick comparison of the two languages on aspects important to those working with data and see what we can learn about the two worlds where R and Python operate. Learning curve Python is the easier language to learn. While R certainly isn’t impenetrable, Python’s syntax marks it as a great language to learn even if you’re completely new to programming. The fact that such an easy language would come to rival R within data science indicates the pace at which the field is expanding. More and more people are taking on data-related roles, possibly without a great deal of programming knowledge — Python makes the barrier to entry much lower than R. That said, once you get to grips with the basics of R, it becomes relatively easier to learn the more advanced stuff. This is why statisticians and experienced programmers find R easier to use. Packages and libraries Many R packages are in-built. Python, meanwhile, depends upon a range of external packages. This obviously makes R much more efficient as a statistical tool — it means that if you’re using Python you need to know exactly what you’re trying to do and what external support you’re going to need. Data Visualization R is well-known for its excellent graphical capabilities. This makes it easy to present and communicate data in varied forms. For statisticians and researchers, the importance of that is obvious. It means you can perform your analysis and present your work in a way that is relatively seamless. The ggplot2 package in R, for example, allows you to create complex and elegant plots with ease and as a result, its popularity in the R community has increased over the years. Python also offers a wide range of libraries which can be used for effective data storytelling. The breadth of external packages available with Python means the scope of what’s possible is always expanding. Matplotlib has been a mainstay of Python data visualization. It’s also worth remarking on upcoming libraries like Seaborn. Seaborn is a neat little library that sits on top of Matplotlib, wrapping its functionality and giving you a neater API for specific applications. So, to sum up, you have sufficient options to perform your data visualization tasks effectively — using either R or Python! Analytics and Machine Learning Thanks to libraries like scikit-learn, Python helps you build machine learning systems with relative ease. This takes us back to the point about barrier to entry. If machine learning is upending how we use and understand data, it makes sense that more people want a piece of the action without having to put in too much effort. But Python also has another advantage; it’s great for creating web services where data can be uploaded by different people. In a world where accessibility and data empowerment have never been more important (i.e., where everyone takes an interest in data, not just the data team), this could prove crucial. With packages such as caret, MICE, and e1071, R too gives you the power to perform effective machine learning in order to get crucial insights out of your data. However, R falls short in comparison to Python, thanks to the latter’s superior libraries and more diverse use-cases. Deep Learning Both R and Python have libraries for deep learning. It’s much easier and more efficient with Python though — most likely because the Python world changes much more quickly, new libraries and tools springing up as quickly as the data science world hooks on to a new buzzword. Theano, and most recently Keras and TensorFlow have all made a huge impact on making it relatively easy to build incredibly complex and sophisticated deep learning systems. If you’re clued-up and experienced with R it shouldn’t be too hard to do the same, using libraries such as MXNetR, deepr, and H2O — that said, if you want to switch models, you may need to switch tools, which could be a bit of a headache. Big Data With Python, you can write efficient MapReduce applications with ease, or scale your R program on Hadoop to work with petabytes of data. Both R and Python are equally good when it comes to working with Big Data, as they can be seamlessly integrated with Big Data tools such as Apache Spark and Apache Hadoop, among many others. It’s likely that it’s in this field that we’re going to see R moving more and more into industry as businesses look for a concise way to handle large datasets. This is true in industries such as bioinformatics which have a close connection with the academic world and necessarily depend upon a combination of size and accuracy when it comes to working with data. So, where does this comparison leave us? Ultimately, what we see are two different languages offering great solutions to very different problems in data science. In Python, we have a flexible and adaptable language with a vibrant community of developers working on a huge range of problems and tasks, each one trying to find more effective and more intelligent ways of doing things. In R, we have a purely statistical language with a large repository of over 8000 packages for data analysis and visualization. While Python is production-ready and is better suited for organizations looking to harness technical innovation to its advantage, R’s analytical and data visualization capabilities can make your life as a statistician or data analyst easier. Recent surveys indicate that Python commands a higher salary than R — that is because it’s a language that can be used across domains; a problem-solving language. That’s not to say that R isn’t a valuable language; rather, Python is the language that just seems to fit the times at the moment. In the end, it all boils down to your background, and the kind of data problems you want to solve. If you come from a statistics or research background and your problems only revolve around statistical analysis and visualization, then R would best fit your bill. However, if you’re a Computer Science graduate looking to build a general-purpose, enterprise-wide data model which can integrate seamlessly with the other business workflows, you will find Python easier to use. R and Python are two different animals. Instead of comparing the two, maybe it’s time we understood where and how each can be best used and then harnessed their power to the fullest to solve our data problems. One thing is for sure, though — neither is going away anytime soon. Both R and Python occupy a large chunk of the data science market-share today, and it will take a major disruption to take either one of them out of the equation completely.

0
1
8772

article-image-level-your-companys-big-data-resource-management

Timothy Chen

24 Dec 2015

4 min read

Level Up Your Company's Big Data With Resource Management

Timothy Chen

24 Dec 2015

4 min read

Big data was once one of the biggest technology hypes, where tons of presentations and posts talked about how the new systems and tools allows large and complex data to be processed that traditional tools wasn't able to. While Big data was at the peak of its hype, most companies were still getting familiar with the new data processing frameworks such as Hadoop, and new databases such as HBase and Cassandra. Fast foward to now where Big data is still a popular topic, and lots of companies has already jumped into the Big data bandwagon and are already moving past the first generation Hadoop to evaluate newer tools such as Spark and newer databases such as Firebase, NuoDB or Memsql. But most companies also learn from running all of these tools, that deploying, operating and planning capacity for these tools is very hard and complicated. Although over time lots of these tools have become more mature, they are still usually running in their own independent clusters. It's also not rare to find multiple clusters of Hadoop in the same company since multi-tenant isn't built in to many of these tools, and you run the risk of overloading the cluster by a few non-critical big data jobs. Problems running indepdent Big data clusters There are a lot of problems when you run a lot of these independent clusters. One of them is monitoring and visibility, where all of these clusters have their own management tools and to integrate the company's shared monitoring and management tools is a huge challenge especially when onboarding yet another framework with another cluster. Another problem is multi-tenancy. Although having independent clusters solves the problem, another org's job can overtake the whole cluster. It still doesn't solve the problem when a bug in the Hadoop application just uses all the available resources and the pain of debugging this is horrific. A another problem is utilization, where a cluster is usually not 100% being utilized and all of these instances running in Amazon or in your datacenter are just racking up bills for doing no work. There are more major pain points that I don't have time to get into. Hadoop v2 The Hadoop developers and operators saw this problem, and in the 2nd generation of Hadoop they developed a separate resource management tool called YARN to have a single management framework that manages all of the resources in the cluster from Hadoop, enforce the resource limitations of the jobs, integrate security in the workload, and even optimize the workload by placing jobs closer to the data automatically. This solves a huge problem when operating a Hadoop cluster, and also consolidates all of the Hadoop clusters into one cluster since it allows a finer grain control over the workload and saves effiency of the cluster. Beyond Hadoop Now with the vast amount of Big data technologies that are growing in the ecosystem, there is a need to integrate a common resource management layer among all of the tools since without a single resource management system across all the frameworks we run back into the same problems as we mentioned before. Also when all these frameworks are running under one resource management platform, a lot of options for optimizations and resource scheduling are now possible. Here are some examples what could be possible with one resource management platform: With one resource management platform the platform can understand all of the cluster workload and available resources and can auto resize and scale up and down based on worklaods across all these tools. It can also resize jobs according to priority. The cluster is able to detect under utilization from other jobs and offer the slack resources to Spark batch jobs while not impacting your very important workloads from other frameworks, and maintain the same business deadlines and save a lot more cost. In the next post I'll continue to cover Mesos, which is one such resource management system and how the upcoming features in Mesos allows optimizations I mentioned to be possible. For more Big Data tutorials and analysis, visit our dedicated Hadoop and Spark pages. About the author Timothy Chen is a distributed systems engineer and entrepreneur. He works at Mesosphere and can be found on Github @tnachen.

0
0
4010

article-image-level-your-companys-big-data-mesos

Timothy Chen

23 Dec 2015

5 min read

Level Up Your Company's Big Data with Mesos

Timothy Chen

23 Dec 2015

5 min read

In my last post I talked about how using a resource management platform can allow your Big Data workloads to be more efficient with less resources. In this post I want to continue the discussion with a specific resource management platform, which is Mesos. Introduction to Mesos Mesos is an Apache top-level project that provides an abstraction to your datacenter resources and an API to program against these resources to launch and manage your workloads. Mesos is able to manage your CPU, memory, disk, ports and other resources that the user can custom defines. Every application that wants to use resources in the datacenter to run tasks talks with Mesos is called a scheduler. It uses the scheduler API to receive resource offers and each scheduler can decide to use the offer, decline the offer to wait for future ones, or hold on the offer for a period of time to combine the resources. Mesos will ensure to provide fairness amongst multiple schedulers so no one scheduler can overtake all the resources. So how does your Big data frameworks benefit specifically by using Mesos in your datacenter? Autopilot your Big data frameworks The first benefit of running your Big data frameworks on top of Mesos, which by abstracting away resources and providing an API to program against your datacenter, is that it allows each Big data framework to self-manage itself without minimal human intervention. How does the Mesos scheduler API provide self management to frameworks? First we should understand a little bit more what does the scheduler API allows you to do. The Mesos scheduler API provides a set of callbacks whenever the following events occurs: New resources available, task status changed, slave lost, executor lost, scheduler registered/disconnected, etc. By reacting to each event with the Big data framework's specific logic it allows frameworks to deploy, handle failures, scale and more. Using Spark as an example, when a new Spark job is launched it launches a new scheduler waiting for resources from Mesos. When new resources are available it deploys Spark executors to these nodes automatically and provide Spark task information to these executors and communicate the results back to the scheduler. When some reason the task is terminated unexpectedly, the Spark scheduler receives the notification and can automatically relaunch that task on another node and attempt to resume the job. When the machine crashes, the Spark scheduler is also notified and can relaunch all the executors on that node to other available resources. Moreover, since the Spark scheduler can choose where to launch the tasks it can also choose the nodes that provides the most data locality to the data it is going to process. It can also choose to deploy the Spark executors in different racks to have more higher availability if it's a long running Spark streaming job. As you can see, by programming against an API allows lots of flexibility and self-managment for the Big data frameworks, and saves a lot of manually scripting and automation that needs to happen. Manage your resources among frameworks and users When there are multiple Big data frameworks sharing the same cluster, and each framework is shared with multiple users, providing a good policy around ensuring the important users and jobs gets executed becomes very important. Mesos allows you to specify roles, where multiple frameworks can belong to a role. Mesos then allows operators to specify weights among these roles, so that the fair share is enforced by Mesos to provide the resources according to the weight specified. For example, one might provide 70% resources to Spark and 30% resources to general tasks with the weighted roles in Mesos. Mesos also allows reserving a fixed amount of resources per agent to a specific role. This ensures that your important workload is guaranteed to have enough resources to complete its workload. There are more features coming to Mesos that also helps multi-tenancy. One feature is called Quota where it ensures over the whole cluster that a certain amount of resources is reserved instead of per agent. Another feature is called dynamic reservation, which allows frameworks and operators to reserve a certain amount of resources at runtime and can unreserve them once it's no longer necessary. Optimize your resources among frameworks Using Mesos also boosts your utilization, by allowing multiple tasks from different frameworks to use the same cluster and boosts utilization without having separate clusters. There are a number of features that are currently being worked on that will even boost the utilization even further. The first feature is called oversubscription, which uses the tasks runtime statistics to estimate the amount of resources that is not being used by these tasks, and offers these resources to other schedulers so more resources is actually being utilized. The oversubscription controller also monitors the tasks to make sure when the task is being affected by sharing resources, it will kill these tasks so it's no longer being affected. Another feature is called optimistic offers, which allows multiple frameworks to compete for resources. This helps utilization by allowing faster scheduling and allows the Mesos scheduler to have more inputs to choose how to best schedule its resources in the future. As you can see Mesos allows your Big data frameworks to be self-managed, more efficient and allows optimizations that are only possible by sharing the same resource management. If you're curious how to get started you can follow at the Mesos website or Mesosphere website that provides even simpler tools to use your Mesos cluster. Want more Big Data tutorials and insight? Both our Spark and Hadoop pages have got you covered. About the author Timothy Chen is a distributed systems engineer and entrepreneur. He works at Mesosphere and can be found on Github @tnachen.

0
0
5704

article-image-biggest-big-data-and-business-intelligence-salary-and-skills-survey-2015

Packt Publishing

03 Aug 2015

1 min read

The biggest Big Data & Business Intelligence salary and skills survey of 2015

Packt Publishing

03 Aug 2015

1 min read

See the highlights from our comprehensive Skill Up IT industry salary reports, with data from over 20,000 IT professionals. Find out what trends are emerging in the world of data science and business intelligence and what skills you should be learning to further your career. Download the full size infographic here.

0
0
2826

article-image-reducing-cost-big-data-using-statistics-and-memory-technology-part-2

Praveen Rachabattuni

06 Jul 2015

6 min read

Reducing Cost in Big Data using Statistics and In-memory Technology - Part 2

Praveen Rachabattuni

06 Jul 2015

6 min read

In the first part of this two-part blog series, we learned that using statistical algorithms gives us a 95 percent accuracy rate for big data analytics, is faster, and is a lot more beneficial than waiting for the exact results. We also took a look at a few algorithms along with a quick introduction to Spark. Now let’s take a look at two tools in depth that are used with statistical algorithms: Apache Spark and Apache Pig. Apache Spark Apache Spark is a fast and general-purpose cluster computing system. It provides high-level APIs in Java, Scala, and Python, as well as an optimized engine that supports general execution graphs. It also supports a rich set of higher-level tools including Spark SQL for SQL and structured data processing, MLlib for machine learning, GraphX for graph processing, and Spark Streaming. At its core, Spark provides a general programming model that enables developers to write applications by composing arbitrary operators, such as mappers, reducers, joins, group-bys, and filters. This composition makes it easy to express a wide array of computations, including iterative machine learning, streaming, complex queries, and batch processing. In addition, Spark keeps track of the data that each of the operators produces, and enables applications to reliably store this data in memory. This is the key to Spark’s performance, as it allows applications to avoid costly disk accesses. It would be wonderful to have one tool for everyone, and one architecture and language for investigative as well as operational analytics. Spark’s ease of use comes from its general programming model, which does not constrain users to structure their applications into a bunch of map and reduce operations. Spark’s parallel programs look very much like sequential programs, which make them easier to develop and reason about. Finally, Spark allows users to easily combine batch, interactive, and streaming jobs in the same application. As a result, a Spark job can be up to 100 times faster and requires writing 210 times less code than an equivalent Hadoop job. Spark allows users and applications to explicitly cache a dataset by calling the cache() operation. This means that your applications can now access data from RAM instead of disk, which can dramatically improve the performance of iterative algorithms that access the same dataset repeatedly. This use case covers an important class of applications, as all machine learning and graph algorithms are iterative in nature. When constructing a complex pipeline of MapReduce jobs, the task of correctly parallelizing the sequence of jobs is left to you. Thus, a scheduler tool such as Apache Oozie is often required to carefully construct this sequence. With Spark, a whole series of individual tasks is expressed as a single program flow that is lazily evaluated so that the system has a complete picture of the execution graph. This approach allows the core scheduler to correctly map the dependencies across different stages in the application, and automatically parallelize the flow of operators without user intervention. With a low-latency data analysis system at your disposal, it’s natural to extend the engine towards processing live data streams. Spark has an API for working with streams, providing exactly-once semantics and full recovery of stateful operators. It also has the distinct advantage of giving you the same Spark APIs to process your streams, including reuse of your regular Spark application code. Pig on Spark Pig on Spark combines the power and simplicity of Apache Pig on Apache Spark, making existing ETL pipelines 100 times faster than before. We do that via a unique mix of our operator toolkit, called DataDoctor, and Spark. The following are the primary goals for the project: Make data processing more powerful Make data processing more simple Make data processing 100 times faster than before DataDoctor is a high-level operator DSL on top of Spark. It has frameworks for no-symmetrical joins, sorting, grouping, and embedding native Spark functions. It hides a lot of complexity and makes it simple to implement data operators used in applications like Pig and Apache Hive on Spark. Pig operates in a similar manner to big data applications like Hive and Cascading. It has a query language quite akin to SQL that allows analysts and developers to design and write data flows. The query language is translated in to a “logical plan” that is further translated in to a “physical plan” containing operators. Those operators are then run on the designated execution engine (MapReduce, Apache Tez, and now Spark). There are a whole bunch of details around tracking progress, handling errors, and so on that I will skip here. Query planning on Spark will vary significantly from MapReduce, as Spark handles data wrangling in a much more optimized way. Further query planning can benefit greatly from ongoing effort on Catalyst inside Spark. At this moment, we have simply introduced a SparkPlanner that will undertake the conversion from a logical to a physical plan for Pig. Databricks is working actively to enable Catalyst to handle much of the operator optimizations that will plug into SparkPlanner in the near future. Longer term, we plan to rely on Spark itself for logical plan generation. An early version of this integration has been prototyped in partnership with Databricks. Pig Core hands off Spark execution to SparkLauncher with the physical plan. SparkLauncher creates a SparkContext providing all the Pig dependency JAR files and Pig itself. SparkLauncher gets an MR plan object created from the physical plan. At this point, we override all the Pig operators to DataDoctor operators recursively in the whole plan. Two iterations are performed over the plan — one that looks at the store operations and recursively travels down the execution tree, and a second iteration that does a breadth-first traversal over the plan and calls convert on each of the operators. The base class of converters in DataDoctor is a POConverter class and defines the abstract method convert, which is called during plan execution. More details of Pig on Spark can be found at PIG4059. As we merge with Apache Pig, we need to focus on the following enhancements to further improve the speed of Pig: Cache operator: Adding a new operator to explicitly tell Spark to cache certain datasets for faster execution Storage hints: Allowing the user to specify the storage location of datasets in Spark for better control of memory YARN and Mesos support: Adding resource manager support for more global deployment and support Conclusion In many large-scale data applications, statistical perspectives provide us with fruitful analytics in many ways, including speed and efficiency. About the author Praveen Rachabattuni is a tech lead at Sigmoid Analytics, a company that provides a real-time streaming and ETL framework on Apache Spark. Praveen is also a committer to Apache Pig.

0
0
2621

Tech Guides - Big Data

NewSQL: What the hype is all about

Hyperledger: The Enterprise-ready Blockchain

Will Ethereum eclipse Bitcoin?

"My Favorite Tools to Build a Blockchain App" - Ed, The Engineer

Top 4 chatbot development frameworks for developers

Introducing Intelligent Apps

DevOps might be the key to your Big Data project success

What we learned from Oracle OpenWorld 2017

Say hello to Streaming Analytics

How Blockchain can level up IoT Security

Trending Topics

Is Python edging R out in the data science wars?

Level Up Your Company's Big Data With Resource Management

Level Up Your Company's Big Data with Mesos

The biggest Big Data & Business Intelligence salary and skills survey of 2015

Reducing Cost in Big Data using Statistics and In-memory Technology - Part 2