Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon

How-To Tutorials

7019 Articles
article-image-stephen-hawking-artificial-intelligence-quotes
Richard Gall
15 Mar 2018
3 min read
Save for later

5 polarizing Quotes from Professor Stephen Hawking on artificial intelligence

Richard Gall
15 Mar 2018
3 min read
Professor Stephen Hawking died today (March 14, 2018) aged 76 at his home in Cambridge, UK. Best known for his theory of cosmology that unified quantum mechanics with Einstein’s General Theory of Relativity, and for his book a Brief History of Time that brought his concepts to a wider general audience, Professor Hawking is quite possibly one of the most important and well-known voices in the scientific world. Among many things, Professor Hawking had a lot to say about artificial intelligence - its dangers, its opportunities and what we should be thinking about, not just as scientists and technologists, but as humans. Over the years, Hawking has remained cautious and consistent in his views on the topic constantly urging AI researchers and machine learning developers to consider the wider implications of their work on society and the human race itself.  The machine learning community is quite divided on all the issues Hawking has raised and will probably continue to be so as the field grows faster than it can be fathomed. Here are 5 widely debated things Stephen Hawking said about AI arranged in chronological order - and if you’re going to listen to anyone, you’ve surely got to listen to him?   On artificial intelligence ending the human race The development of full artificial intelligence could spell the end of the human race….It would take off on its own, and re-design itself at an ever-increasing rate. Humans, who are limited by slow biological evolution, couldn't compete and would be superseded. From an interview with the BBC, December 2014 On the future of AI research The establishment of shared theoretical frameworks, combined with the availability of data and processing power, has yielded remarkable successes in various component tasks such as speech recognition, image classification, autonomous vehicles, machine translation, legged locomotion, and question-answering systems. As capabilities in these areas and others cross the threshold from laboratory research to economically valuable technologies, a virtuous cycle takes hold whereby even small improvements in performance are worth large sums of money, prompting greater investments in research. There is now a broad consensus that AI research is progressing steadily, and that its impact on society is likely to increase.... Because of the great potential of AI, it is important to research how to reap its benefits while avoiding potential pitfalls. From Research Priorities for Robust and Beneficial Artificial Intelligence, an open letter co-signed by Hawking, January 2015 On AI emulating human intelligence I believe there is no deep difference between what can be achieved by a biological brain and what can be achieved by a computer. It, therefore, follows that computers can, in theory, emulate human intelligence — and exceed it From a speech given by Hawking at the opening of the Leverhulme Centre of the Future of Intelligence, Cambridge, U.K., October 2016 On making artificial intelligence benefit humanity Perhaps we should all stop for a moment and focus not only on making our AI better and more successful but also on the benefit of humanity. Taken from a speech given by Hawking at Web Summit in Lisbon, November 2017 On AI replacing humans The genie is out of the bottle. We need to move forward on artificial intelligence development but we also need to be mindful of its very real dangers. I fear that AI may replace humans altogether. If people design computer viruses, someone will design AI that replicates itself. This will be a new form of life that will outperform humans. From an interview with Wired, November 2017
Read more
  • 0
  • 2
  • 40636

article-image-using-ipv6-packet-tracer
Packt
13 Jan 2014
6 min read
Save for later

Using IPv6 on Packet Tracer

Packt
13 Jan 2014
6 min read
This article is written by Jesin A the author of Packet Tracer Network Simulator. Cisco Packet Tracer is a powerful network simulation program and provides simulation, visualization, authoring, assessment, and shows collaboration capabilities of a network. This article explains the IPv6 addresses used in Packet Tracer. IPv4 has 4.3 billion addresses, which may seem mindboggling. However, it took only two decades for it to reach its depletion. IPv6 has come to the rescue in the form of 128-bit addresses. Packet Tracer supports a wide array of IPv6 features. We'll start by learning how to assign IP addresses to different devices and how to configure routing between them. Finally, we'll create a setup that enables IPv6 communication over IPv4 devices. Assigning IPv6 addresses Starting from Packet Trace Version 6, the IP Configuration utility under the Desktop tab of end devices has an option to enter an IPv6 address. Let's begin with a simple topology consisting of two PCs and a router connected to a switch, as shown in the following screenshot: There are three ways of assigning IPv6 addresses to a device and we'll see each one of them. Autoconfiguration Autoconfiguration requires the least amount of configuration but makes it difficult to remember the IPv6 addresses. This method uses the MAC address of the device to create an IPv6 address with the FE80:: prefix. Carry out the following steps to assign IPv6 addresses using Autoconfiguration: Begin by configuring the router. Enter the interface configuration mode and enable IPv6 on the interface. R0(config)#ipv6 unicast-routing R0(config)#interface FastEthernet0/0 R0(config-if)#ipv6 enable Next, we will configure a link local address and a global unicast address on this interface. We'll use eui-64 to reduce the configuration. R0(config-if)#ipv6 address autoconfig R0(config-if)#ipv6 add 2000::/64 eui-64 R0(config-if)#no shutdown Verify that the interface is up and has two IPv6 addresses. R0>sh ipv6 interface brief FastEthernet0/0 [up/up] FE80::2D0:58FF:FE65:E701 2000::2D0:58FF:FE65:E701 These IPv6 addresses may vary when you try them out, as they are based on the MAC address. Enable routing so that this router can be identified as a default gateway. R0(config)#ipv6 unicast-routing The configuration of the router is now done, let's move on to the PCs. Go to the Desktop tab of the PC, open IP Configuration , and under the IPv6 Configuration section, choose Auto Config . The gateway and the PC's IP address will be assigned automatically, as shown in the following screenshot: Use the simple PDU tool to test the connectivity; you'll see ICMPv6 packets moving between the nodes. To view the IPv6 address from the command line of PCs, use the ipv6config command. Static IPv6 IPv6 addresses can also be assigned statically on all devices. We'll use the same topology for this section too. We'll carry out the following steps to configure IPv6 addresses statically: Begin by configuring a static IPv6 address on the router. R0(config)#interface fastethernet0/0 R0(config-if)#ipv6 enable R0(config-if)#ipv6 address 2000::1/64 R0(config-if)#no shutdown Go to the Desktop tab of PC, open the IP Configuration utility, and enter an IPv6 address with the same prefix. Now use the simple PDU tool to test the connectivity. Once both the methods work fine, you can have a look at the IPv6 neighbors table. This is similar to the ARP table of IPv4. R0#sh ipv6 neighbor IPv6 Address Age Link-layer Addr State Interface 2000::2 0 00E0.A39E.05C4 REACH Fa0/0 2000::3 0 0001.43B9.0268 REACH Fa0/0 Now that we have configured IPv6 addresses on a single network, let's configure them on more networks and enable routing between them. IPv6 static and dynamic routing Similar to IPv4, IPv6 too supports both static and dynamic routing. Configuration commands for its static routing are similar to IPv4. Static routing Modifying the same topology that we used previously, let's add a router, switch, and two PCs to create a separate network, as shown in the following screenshot: The first network will use addresses starting from 2000:1::/64 and the second network will use addresses starting from 2000:2::/64. The link between both the routers will have IP addresses 2001::10/64 and 2001::20/64. Here is a table describing the topology: Device Interface IP address R1 FastEthernet0/0 2000:1::1/64   FastEthernet0/1 2001::10/64 PC0 FastEthernet 2000:1::2/64 PC1 FastEthernet 2000:1::3/64 R2 FastEthernet0/0 2000:2::1/64   FastEthernet0/1 2001::20/64 PC2 FastEthernet 2000:2::2/64 PC3 FastEthernet 2000:2::3/64 After the necessary IP addresses and gateways have been assigned, open the CLI tab for the R1 router, and start configuring routing by following the given commands: R1(config)#ipv6 unicast-routing R1(config)#ipv6 route 2000:2::/64 2001::20 Next, open the CLI tab for R2 and configure routing on it. R2(config)#ipv6 unicast-routing R2(config)#ipv6 route 2000:1::/64 2001::10 Now use the simple PDU tool to test the connectivity. You may also use the tracert command on a PC to see the path a packet takes. PC>tracert 2000:2::3 Tracing route to 2000:2::3 over a maximum of 30 hops: 1 63 ms 63 ms 47 ms 2000:1::1 2 94 ms 78 ms 94 ms 2001::20 3 156 ms 109 ms 129 ms 2000:2::3 Trace complete. Dynamic routing Packet Tracer offers the same dynamic routing protocols for IPv6: RIPv6, EIGRP, and OSPF. We'll be configuring RIPv6 in this section. Note that RIPv6 does not represent RIP Version 6; it is RIP for IPv6 addresses. For this exercise, we'll use the topology shown in the following screenshot: The additional IP assignment details alone are shown in the following table: Device Interface IPv6 Address R2 FastEthernet1/0 2001:1::10/64 R3 FastEthernet0/0 2000:3::1/64   FastEthernet0/1 2001:1::20/64 PC2 FastEthernet 2000:3::2/64 We'll see how to configure RIP on one router and you can do the same on the others. R1(config)#interface FastEthernet0/0 R1(config-if)#ipv6 address 2000:1::1/64 R1(config-if)#ipv6 rip Net1 enable R1(config-if)#ipv6 enable R1(config-if)#interface FastEthernet0/1 R1(config-if)#ipv6 address 2001::10/64 R1(config-if)#ipv6 rip Net1 enable R1(config-if)#ipv6 enable Note that the ipv6 rip command is used to enable RIP on a particular interface. Entering ipv6 rip Net1 enable on the first interface begins the RIPv6 process. The Net1 string can be any name that can be used to name the RIP process. Once configured, use the usual diagnostic tools (ping to simple PDU) to check the connectivity. To view the RIP database, use the following command: R1#sh ipv6 rip database RIP process "Net1" local RIB 2000:2::/64, metric 2, installed FastEthernet0/1/FE80::201:97FF:FE87:E5A9, expires in 173 sec 2000:3::/64, metric 3, installed FastEthernet0/1/FE80::201:97FF:FE87:E5A9, expires in 173 sec 2001::/64, metric 2 FastEthernet0/1/FE80::201:97FF:FE87:E5A9, expires in 173 sec 2001:1::/64, metric 2, installed FastEthernet0/1/FE80::201:97FF:FE87:E5A9, expires in 173 sec RIP process "LINK" local RIB Trace the route of the packet to see the path it takes. PC>tracert 2000:3::2 Tracing route to 2000:3::2 over a maximum of 30 hops: 1 31 ms 32 ms 31 ms 2000:1::1 2 50 ms 50 ms 63 ms 2001::20 3 94 ms 94 ms 94 ms 2001:1::20 4 125 ms 109 ms 125 ms 2000:3::2 Trace complete. Summary In this article, we learned how to use IPv6 with Packet Tracer. We saw the limitation of the IPv4 addresses. We also learned how to assign IPv6 addresses and how to configure IPv6 static and dynamic routing. Resources for Article : How to edit the attributes in QGIS Troubleshooting OpenStack Compute problems Creating Identity and Resource Pools in Cisco Unified Computing System
Read more
  • 0
  • 0
  • 40025

article-image-powerful-custom-visuals-in-power-bi-tutorial
Pravin Dhandre
25 Jul 2018
17 min read
Save for later

4 powerful custom visuals in Power BI: Why, When, and How to add [Tutorial]

Pravin Dhandre
25 Jul 2018
17 min read
Power BI report authors and BI teams are well-served to remain conscience of both the advantages and limitations of custom visuals. For example, when several measures or dimension columns need to be displayed within the same visual, custom visuals such as the Impact Bubble Chart and the Dot Plot by Maq Software may exclusively address this need. In many other scenarios, a trade-off or compromise must be made between the incremental features provided by a custom visual and the rich controls built into a standard Power BI visual. In this tutorial, we show how to add a custom visual to Power BI and explore 4 powerful custom visuals, and the distinct scenarios and features they support. The Power BI tutorial is taken from Mastering Microsoft Power BI. Learn more - read the book here. Custom visuals available in AppSource and within the integrated custom visuals store for Power BI Desktop are all approved for running in browsers and on mobile devices via the Power BI mobile apps. A subset of these visuals have been certified by Microsoft and support additional Power BI features such as email subscriptions and export to PowerPoint. Additionally, certified custom visuals have met a set of code requirements and have passed strict security tests. The list of certified custom visuals and additional details on the certification process is available here. Adding a custom visual Custom visuals can be added to Power BI reports by either downloading .pbiviz files from Microsoft AppSource or via the integrated Office Store of custom visuals in Power BI Desktop. Utilizing AppSource requires the additional step of downloading the file; however, it can be more difficult to find the appropriate visual as the visuals are not categorized. However, AppSource provides a link to download a sample Power BI report (.pbix file) to learn how the visual is used, such as how it uses field inputs and formatting options. Additionally, AppSource includes a short video tutorial on building report visualizations with the custom visual. The following image reflects Microsoft AppSource filtered by the Power BI visuals Add-ins category: The following link filters AppSource to the Power BI custom visuals per the preceding image: http://bit.ly/2BIZZbZ. The search bar at the top and the vertical scrollbar on the right can be used to browse and identify custom visuals to download. Each custom visual tile in AppSource includes a Get it now link which, if clicked, presents the option to download either the custom visual itself (.pbiviz file) or the sample report for the custom visual (.pbix file). Clicking anywhere else in the tile other than Get it now prompts a window with a detailed overview of the visual, a video tutorial, and customer reviews. To add custom visuals directly to Power BI reports, click the Import from store option via the ellipsis of the Visulaizations pane, as per the following image: If a custom visual (.pbiviz file) has been downloaded from AppSource, the Import from file option can be used to import this custom visual to the report. Additionally, both the Import from store and Import from file options are available as icons on the Home tab of the Report view in Power BI Desktop. Selecting Import from store launches an MS Office Store window of Power BI Custom Visuals. Unlike AppSource, the visuals are assigned to categories such as KPIs, Maps, and Advanced Analytics, making it easy to browse and compare related visuals. More importantly, utilizing the integrated Custom Visuals store avoids the need to manage .pbiviz files and allows report authors to remain focused on report development. As an alternative to the VISUALIZATIONS pane, the From Marketplace and From File icons on the Home tab of the Report view can also be used to add a custom visual. Clicking the From Marketplace icon in the follow image launches the same MS Office Store window of Power BI Custom visuals as selecting Import from store via the VISUALIZATIONS pane: In the following image, the KPIs category of Custom visuals is selected from within the MS Office store: The Add button will directly add the custom visual as a new icon in the Visualizations pane. Selecting the custom visual icon will provide a description of the custom visual and any customer reviews. The Power BI team regularly features new custom visuals in the blog post and video associated with the monthly update to Power BI Desktop. The visual categories, customer reviews, and supporting documentation and sample reports all assist report authors in choosing the appropriate visual and using it correctly. Organizations can also upload custom visuals to the Power BI service via the organization visuals page of the Power BI Admin portal. Once uploaded, these visuals are exposed to report authors in the MY ORGANIZATION tab of the custom visuals MARKETPLACE as per the following example: This feature can help both organizations and report authors simplify their use of custom visuals by defining and exposing a particular set of approved custom visuals. For example, a policy could define that new Power BI reports must only utilize standard and organizational custom visuals. The list of organizational custom visuals could potentially only include a subset of the visuals which have been certified by Microsoft. Alternatively, an approval process could be implemented so that the use case for a custom visual would have to be proven or validated prior to adding this visual to the list of organizational custom visuals. Power KPI visual Key Performance Indicators (KPIs) are often prominently featured in Power BI dashboards and in the top left area of Power BI report pages, given their ability to quickly convey important insights. Unlike card and gauge visuals which only display a single metric or a single metric relative to a target respectively, KPI visuals support trend, variance, and conditional formatting logic. For example, without analyzing any other visuals, a user could be drawn to a red KPI indicator symbol and immediately understand the significance of a variance to a target value as well as the recent performance of the KPI metric. For some users, particularly executives and senior managers, a few KPI visuals may represent their only exposure to an overall Power BI solution, and this experience will largely define their impression of Power BI's capabilities and the Power BI project. Given their power and important use cases, report authors should become familiar with both the standard KPI visual and the most robust custom KPI visuals such as the Power KPI Matrix, the Dual KPI, and the Power KPI. Each of these three visuals have been developed by Microsoft and provide additional options for displaying more data and customizing the formatting and layout. The Power KPI Matrix supports scorecard layouts in which many metrics can be displayed as rows or columns against a set of dimension categories such as Operational and Financial. The Dual KPI, which was featured in the Microsoft Power BI Cookbook (https://www.packtpub.com/big-data-and-business-intelligence/microsoft-power-bi-cookbook), is a good choice for displaying two closely related metrics such as the volume of customer service calls and the average waiting time for customer service calls. One significant limitation of custom KPI visuals is that data alerts cannot be configured on the dashboard tiles reflecting these visuals in the Power BI service. Data alerts are currently exclusive to the standard card, gauge, and KPI visuals. In the following Power KPI visual, Internet Net Sales is compared to Plan, and the prior year Internet Net Sales and Year-over-Year Growth percent metrics are included to support the context: The Internet Net Sales measure is formatted as a solid, green line whereas the Internet Sales Plan and Internet Net Sales (PY) measures are formatted with Dotted and Dot-dashed line styles respectively. To avoid clutter, the Y-Axis has been removed and the Label Density property of the Data labels formatting card has been set to 50 percent. This level of detail (three measures with variances) and formatting makes the Power KPI one of the richest visuals in Power BI. The Power KPI provides many options for report authors to include additional data and to customize the formatting logic and layout. Perhaps its best feature, however, is the Auto Scale property, which is enabled by default under the Layout formatting card. For example, in the following image, the Power KPI visual has been pinned to a Power BI dashboard and resized to the smallest tile size possible: As per the preceding dashboard tile, the less critical data elements such as July through August and the year-over- year % metric were removed. This auto scaling preserved space for the KPI symbol, the axis value (2017-Nov), and the actual value ($296K). With Auto Scale, a large Power KPI custom visual can be used to provide granular details in a report and then re-used in a more compact format as a tile in a Power BI dashboard. Another advantage of the Power KPI is that minimal customization of the data model is required. The following image displays the dimension column and measures of the data model mapped to the field inputs of the aforementioned Power KPI visual: The Sales and Margin Plan data is available at the monthly grain and thus the Calendar Yr-Mo column is used as the Axis input. In other scenarios, a Date column would be used for the Axis input provided that the actual and target measures both support this grain. The order of the measures used in the Values field input is interpreted by the visual as the actual value, the target value, and the secondary value. In this example, Internet Net Sales is the first or top measure in the Values field and thus is used as the actual value (for example, $296K for November). A secondary value as the third measure in the Values input (Internet Net Sales (PY)) is not required if the intent is to only display the actual value versus its target. The KPI Indicator Value and Second KPI Indicator Value fields are also optional. If left blank, the Power KPI visual will automatically calculate these two values as the percentage difference between the actual value and the target value, and the actual value and the secondary value respectively. In this example, these two calculations are already included as measures in the data model and thus applying the Internet Net Sales Var to Plan % and Internet Net Sales (YOY %) measures to these fields further clarifies how the visual is being used. If the metric being used as the actual value is truly a critical measure (for example, revenue or count of customers) to the organization or the primary user, it's almost certainly appropriate that related target and variance measures are built into the Power BI dataset. In many cases, these additional measures will be used independently in their own visuals and reports. Additionally, if a target value is not readily available, such as the preceding example with the Internet Net Sales Plan, BI teams can work with stakeholders on the proper logic to apply to a target measure, for example, 10 percent greater than the previous year. The only customization required is the KPI Indicator Index field. The result of the expression used for this field must correspond to one of five whole numbers (1-5) and thus one of the five available KPI Indicators. In the following example, the KPI Indicators KPI 1 and KPI 2 have been customized to display a green caret up icon and a red caret down icon respectively: Many different KPI Indicator symbols are available including up and down arrows, flags, stars, and exclamation marks. These different symbols can be formatted and then displayed dynamically based on the KPI Indicator Index field expression. In this example, a KPI index measure was created to return the value 1 or 2 based on the positive or negative value of the Internet Net Sales Var to Plan % measure respectively: Internet Net Sales vs Plan Index = IF([Internet Net Sales Var to Plan %] > 0,1,2) Given the positive 4.6 percent variance for November of 2017, the value 1 is returned by the index expression and the green caret up symbol for KPI 1 is displayed. With five available KPI Indicators and their associated symbols, it's possible to embed much more elaborate logic such as five index conditions (for example, poor, below average, average, above average, good) and five corresponding KPI indicators. Four different layouts (Top, Left, Bottom, and Right) are available to display the values relative to the line chart. In the preceding example, the Top layout is chosen as this results in the last value of the Axis input (2017-Nov) to be displayed in the top left corner of the visual. Like the standard line chart visual in Power BI Desktop, the line style (for example, Dotted, Solid, Dashed), color, and thickness can all be customized to help distinguish the different series. Chiclet Slicer The standard slicer visual can display the items of a source column as a list or as a dropdown. Additionally, if presented as a list, the slicer can optionally be displayed horizontally rather than vertically. The custom Chiclet Slicer, developed by Microsoft, allows report authors to take even greater control over the format of slicers to further improve the self-service experience in Power BI reports. In the following example, a Chiclet Slicer has been formatted to display calendar months horizontally as three columns: Additionally, a dark green color is defined as the Selected Color property under the Chiclets formatting card to clearly identify the current selections (May and June). The Padding and Outline Style properties, also available under the Chiclets card, are set to 1 and Square respectively, to obtain a simple and compact layout. Like the slicer controls in Microsoft Excel, Chiclet Slicers also support cross highlighting. To enable cross highlighting, specify a measure which references a fact table as the Values input field to the Chiclet Slicer. For example, with the Internet Net Sales measure set as the Values input of the Chiclet Slicer, a user selection on a bar representing a product in a separate visual would update the Chiclet Slicer to indicate the calendar months without Internet Sales for the given product. The Disabled Color property can be set to control the formatting of these unrelated items. Chiclet Slicers also support images. In the following example, one row is used to display four countries via their national flags: For this visual, the Padding and Outline Style properties under the Chiclets formatting card are set to 2 and Cut respectively. Like the Calendar Month slicer, a dark green color is configured as the Selected Color property helping to identify the country or countries selected—Canada, in this example. The Chiclet Slicer contains three input field wells—Category, Values, and Image. All three input field wells must have a value to display the images. The Category input contains the names of the items to be displayed within the Chiclets. The Image input takes a column with URL links corresponding to images for the given category values. In this example, the Sales Territory Country column is used as the Category input and the Internet Net Sales measure is used as the Values input to support cross highlighting. The Sales Territory URL column, which is set as an Image URL data category, is used as the Image input. For example, the following Sales Territory URL value is associated with the United States: http://www.crwflags.com/fotw/images/u/us.gif. A standard slicer visual can also display images when the data category of the field used is set as Image URL. However, the standard slicer is limited to only one input field and thus cannot also display a text column associated with the image. Additionally, the standard slicer lacks the richer cross-highlighting and formatting controls of the Chiclet Slicer. Impact Bubble Chart One of the limitations with standard Power BI visuals is the number of distinct measures that can be represented graphically. For example, the standard scatter chart visual is limited to three primary measures (X-AXIS, Y-AXIS, and SIZE), and a fourth measure can be used for color saturation. The Impact Bubble Chart custom visual, released in August of 2017, supports five measures by including a left and right bar input for each bubble. In the following visual, the left and right bars of the Impact Bubble Chart are used to visually indicate the distribution of AdWorks Net Sales between Online and Reseller Sales channels: The Impact Bubble Chart supports five input field wells: X-AXIS, Y-AXIS, SIZE, LEFT BAR, and RIGHT BAR. In this example, the following five measures are used for each of these fields respectively: AdWorks Net Sales, AdWorks Net Margin %, AdWorks Net Sales (YTD), Internet Net Sales, and Reseller Net Sales. The length of the left bar indicates that Australia's sales are almost exclusively derived from online sales. Likewise, the length of the right bar illustrates that Canada's sales are almost wholly obtained via Reseller Sales. These graphical insights per item would not be possible for the standard Power BI scatter chart. Specifically, the Internet Net Sales and Reseller Net Sales measures could only be added as Tooltips, thus requiring the user to hover over each individual bubble. In its current release, the Impact Bubble Chart does not support the formatting of data labels, a legend, or the axis titles. Therefore, a supporting text box can be created to advise the user of the additional measures represented. In the top right corner of this visual, a text box is set against the background to associate measures to the two bars and the size of the bubbles. Dot Plot by Maq Software Just as the Impact Bubble Chart supports additional measures, the Dot Plot by Maq Software allows for the visualization of up to four distinct dimension columns. With three Axis fields and a Legend field, a measure can be plotted to a more granular level than any other standard or custom visual currently available to Power BI. Additionally, a rich set of formatting controls are available to customize the Dot Plot's appearance, such as orientation (horizontal or vertical), and whether the Axis categories should be split or stacked. In the following visual, each bubble represents the internet sales for a specific grouping of the following dimension columns: Sales Territory Country, Product Subcategory, Promotion Type, and Customer History Segment: For example, one bubble represents the Internet Sales for the Road Bikes Product Subcategory within the United States Sales Territory Country, which is associated with the volume discount promotion type and the first year Customer History Segment. In this visual, the Customer History Segment column is used as the legend and thus the color of each bubble is automatically formatted to one of the three customer history segments. In the preceding example, the Orientation property is set to Horizontal and the Split labels property under the Axis category formatting card is enabled. The Split labels formatting causes the Sales Territory Country column to be displayed on the opposite axis of the Product Subcategory column. Disabling this property results in the two columns being displayed as a hierarchy on the same axis with the child column (Product Subcategory) positioned inside the parent column (Sales Territory Country). Despite its power in visualizing many dimension columns and its extensive formatting features, data labels are currently not supported. Therefore, when the maximum of four dimension columns are used, such as in the previous example, it's necessary to hover over the individual bubbles to determine which specific grouping the bubble represents, such as in the following example: With this, you can easily extend solutions beyond the capabilities of Power BI's standard visuals and support specific and unique, complex use-cases. If you found this tutorial useful, do check out the book Mastering Microsoft Power BI and develop visually rich, immersive, and interactive Power BI reports and dashboards. Building a Microsoft Power BI Data Model How to build a live interactive visual dashboard in Power BI with Azure Stream How to use M functions within Microsoft Power BI for querying data “Tableau is the most powerful and secure end-to-end analytics platform”: An interview with Joshua Milligan
Read more
  • 0
  • 0
  • 39519
Banner background image

article-image-what-are-rest-verbs-and-status-codes-tutorial
Sugandha Lahoti
02 Oct 2018
12 min read
Save for later

What are REST verbs and status codes [Tutorial]

Sugandha Lahoti
02 Oct 2018
12 min read
The name Representational state transfer (REST) was coined by Roy Fielding from the University of California. It is a very simplified and lightweight web service compared to SOAP. Performance, scalability, simplicity, portability, and modifiability are the main principles behind the REST design. REST is a stateless, cacheable, and simple architecture that is not a protocol but a pattern. In this tutorial, we will talk about REST verbs and status codes. The article is taken from the book Building RESTful Web services with Go by Naren Yellavula. In this book, you will explore, the necessary concepts of REST API development by building a few real-world services from scratch. REST verbs REST verbs specify an action to be performed on a specific resource or a collection of resources. When a request is made by the client, it should send this information in the HTTP request: REST verb Header information Body (optional) As we mentioned previously, REST uses the URI to decode its resource to be handled. There are quite a few REST verbs available, but six of them are used frequently. They are as follows: GET POST PUT PATCH DELETE OPTIONS If you are a software developer, you will be dealing with these six most of the time. The following table explains the operation, target resource, and what happens if the request succeeds or fails: REST Verb Action Success Failure GET Fetches a record or set of resources from the server 200 404 OPTIONS Fetches all available REST operations 200 - POST Creates a new set of resources or a resource 201 404, 409 PUT Updates or replaces the given record 200, 204 404 PATCH Modifies the given record 200, 204 404 DELETE Deletes the given resource 200 404 The numbers in the Success and Failure columns of the preceding table are HTTP status codes. Whenever a client initiates a REST operation, since REST is stateless, the client should know a way to find out whether the operation was successful or not. For that reason, HTTP has status codes for the response. REST defines the preceding status code types for a given operation. This means a REST API should strictly follow the preceding rules to achieve client-server communication. All defined REST services have the following format. It consists of the host and API endpoint. The API endpoint is the URL path which is predefined by the server. Every REST request should hit that path. A trivial REST API URI: http://HostName/API endpoint/Query(optional) Let us look at all the verbs in more detail. The REST API design starts with the definition of operations and API endpoints. Before implementing the API, the design document should list all the endpoints for the given resources. In the following section, we carefully observe the REST API endpoints using PayPal's REST API as a use case. GET A GET method fetches the given resource from the server. To specify a resource, GET uses a few types of URI queries: Query parameters Path-based parameters In case you didn't know, all of your browsing of the web is done by performing a GET request to the server. For example, if you type www.google.com, you are actually making a GET request to fetch the search page. Here, your browser is the client and Google's web server is the backend implementer of web services. A successful GET operation returns a 200 status code. Examples of path parameters: Everyone knows PayPal. PayPal creates billing agreements with companies. If you register with PayPal for a payment system, they provide you with a REST API for all your billing needs. The sample GET request for getting the information of a billing agreement looks like this: /v1/payments/billing-agreements/agreement_id. Here, the resource query is with the path parameter. When the server sees this line, it interprets it as I got an HTTP request with a need for agreement_id from the billing agreements. Then it searches through the database, goes to the billing-agreements table, and finds an agreement with the given agreement_id. If that resource exists it sends the details to copy back in response (200 OK). Or else it sends a response saying resource not found (404). Using GET, you can also query a list of resources, instead of a single one like the preceding example. PayPal's API for getting billing transactions related to an agreement can be fetched with /v1/payments/billing-agreements/transactions. This line fetches all transactions that occurred on that billing agreement. In both, the case's data is retrieved in the form of a JSON response. The response format should be designed beforehand so that the client can consume it in the agreement. Examples of query parameters are as follows: Query parameters are intended to add detailed information to identify a resource from the server. For example, take this sample fictitious API. Let us assume this API is created for fetching, creating, and updating the details of the book. A query parameter based GET request will be in this format:  /v1/books/?category=fiction&publish_date=2017 The preceding URI has few query parameters. The URI is requesting a book from the book's resource that satisfies the following conditions: It should be a fiction book The book should have been published in the year 2017 Get all the fiction books that are released in the year 2017 is the question the client is posing to the server. Path vs Query parameters—When to use them? It is a common rule of thumb that Query parameters are used to fetch multiple resources based on the query parameters. If a client needs a single resource with exact URI information, it can use Path parameters to specify the resource. For example, a user dashboard can be requested with Path parameters and fetch data on filtering can be modeled with Query parameters. Use Path parameters for a single resource and Query parameters for multiple resources in a GET request. POST, PUT, and PATCH The POST method is used to create a resource on the server. In the previous book's API, this operation creates a new book with the given details. A successful POST operation returns a 201 status code. The POST request can update multiple resources: /v1/books. The POST request has a body like this: {"name" : "Lord of the rings", "year": 1954, "author" : "J. R. R. Tolkien"} This actually creates a new book in the database. An ID is assigned to this record so that when we GET the resource, the URL is created. So POST should be done only once, in the beginning. In fact, Lord of the Rings was published in 1955. So we entered the published date incorrectly. In order to update the resource, let us use the PUT request. The PUT method is similar to POST. It is used to replace the resource that already exists. The main difference is that PUT is idempotent. A POST call creates two instances with the same data. But PUT updates a single resource that already exists: /v1/books/1256 with body that is JSON like this: {"name" : "Lord of the rings", "year": 1955, "author" : "J. R. R. Tolkien"} 1256 is the ID of the book. It updates the preceding book by year:1955. Did you observe the drawback of PUT? It actually replaced the entire old record with the new one. We needed to change a single column. But PUT replaced the whole record. That is bad. For this reason, the PATCH request was introduced. The PATCH method is similar to PUT, except it won't replace the whole record. PATCH, as the name suggests, patches the column that is being modified. Let us update the book 1256 with a new column called ISBN: /v1/books/1256 with the JSON body like this: {"isbn" : "0618640150"} It tells the server, Search for the book with id 1256. Then add/modify this column with the given value.  PUT and PATCH both return the 200 status for success and 404 for not found. DELETE and OPTIONS The DELETE API method is used to delete a resource from the database. It is similar to PUT but without any body. It just needs an ID of the resource to be deleted. Once a resource gets deleted, subsequent GET requests return a 404 not found status. Responses to this method are not cacheable (in case caching is implemented)  because the DELETE method is idempotent. The OPTIONS API method is the most underrated in the API development. Given the resource, this method tries to know all possible methods (GET, POST, and so on) defined on the server. It is like looking at the menu card at a restaurant and then ordering an item which is available (whereas if you randomly order a dish, the waiter will tell you it is not available). It is best practice to implement the OPTIONS method on the server. From the client, make sure OPTIONS is called first, and if the method is available, then proceed with it. Cross-Origin Resource Sharing (CORS) The most important application of this OPTIONS method is Cross-Origin Resource Sharing (CORS). Initially, browser security prevented the client from making cross-origin requests. It means a site loaded with the URL www.foo.com can only make API calls to that host. If the client code needs to request files or data from www.bar.com, then the second server, bar.com, should have a mechanism to recognize foo.com to get its resources. This process explains the CORS: foo.com requests the OPTIONS method on bar.com. bar.com sends a header like Access-Control-Allow-Origin: http://foo.com in response to the client. Next, foo.com can access the resources on bar.com without any restrictions that call any REST method. If bar.com feels like supplying resources to any host after one initial request, it can set Access control to * (that is, any). The following is the diagram depicting the process happening one after the other:   Types of status codes There are a few families of status codes. Each family globally explains an operation status. Each member of that family may have a deeper meeting. So a REST API should strictly tell the client what exactly happened after the operation. There are 60+ status codes available. But for REST, we concentrate on a few families of codes. 2xx family (successful) 200 and 201 fall under the success family. They indicate that an operation was successful. Plain 200 (Operation Successful) is a successful CRUD Operation: 200 (Successful Operation) is the most common type of response status code in REST 201 (Successfully Created) is returned when a POST operation successfully creates a resource on the server 204 (No content) is issued when a client needs a status but not any data back 3xx family (redirection) These status codes are used to convey redirection messages. The most important ones are 301 and 304:   301 is issued when a resource is moved permanently to a new URL endpoint. It is essential when an old API is deprecated. It returns the new endpoint in the response with the 301 status. By seeing that, the client should use the new URL in response to achieving its target. The 304 status code indicates that content is cached and no modification happened for the resource on the server. This helps in caching content at the client and only requests data when the cache is modified. 4xx family (client error) These are the standard error status codes which the client needs to interpret and handle further actions. These have nothing to do with the server. A wrong request format or ill-formed REST method can cause these errors. Of these, the most frequent status codes API developers use are 400, 401, 403, 404, and 405: 400 (Bad Request) is returned when the server cannot understand the client request. 401 (Unauthorized) is returned when the client is not sending the authorization information in the header. 403 (Forbidden) is returned when the client has no access to a certain type of resources. 404 (Not Found) is returned when the client request is on a resource that is nonexisting. 405 (Method Not Allowed) is returned if the server bans a few methods on resources. GET and HEAD are exceptions. 5xx family (server error) These are the errors from the server. The client request may be perfect, but due to a bug in the server code, these errors can arise. The commonly used status codes are 500, 501, 502, 503,  and 504: 500 (Internal Server Error) status code gives the development error which is caused by some buggy code or some unexpected condition 501 (Not Implemented) is returned when the server is no longer supporting the method on a resource 502 (Bad Gateway) is returned when the server itself got an error response from another service vendor 503 (Service Unavailable) is returned when the server is down due to multiple reasons, like a heavy load or for maintenance 504 (Gateway Timeout) is returned when the server is waiting a long time for a response from another vendor and is taking too much time to serve the client For more details on status codes, visit this link: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status In this article, we gave an introduction to the REST API and then talked about REST has verbs and status codes. We saw what a given status code refers to. Next, to dig deeper into URL routing with REST APIs, read our book Building RESTful Web services with Go. Design a RESTful web API with Java [Tutorial] What RESTful APIs can do for Cloud, IoT, social media and other emerging technologies Building RESTful web services with Kotlin
Read more
  • 0
  • 0
  • 38685

article-image-implementing-3-naive-bayes-classifiers-in-scikit-learn
Packt Editorial Staff
07 May 2018
13 min read
Save for later

Implementing 3 Naive Bayes classifiers in scikit-learn

Packt Editorial Staff
07 May 2018
13 min read
Scikit-learn provide three naive Bayes implementations: Bernoulli, multinomial and Gaussian. The only difference is about the probability distribution adopted. The first one is a binary algorithm particularly useful when a feature can be present or not. Multinomial naive Bayes assumes to have feature vector where each element represents the number of times it appears (or, very often, its frequency). This technique is very efficient in natural language processing or whenever the samples are composed starting from a common dictionary. The Gaussian Naive Bayes, instead, is based on a continuous distribution and it's suitable for more generic classification tasks. Ok, now that we have established naive Bayes variants are a handy set of algorithms to have in our machine learning arsenal and that Scikit-learn is a good tool to implement them, let’s rewind a bit. What is Naive Bayes? Naive Bayes are a family of powerful and easy-to-train classifiers, which determine the probability of an outcome, given a set of conditions using the Bayes' theorem. In other words, the conditional probabilities are inverted so that the query can be expressed as a function of measurable quantities. The approach is simple and the adjective naive has been attributed not because these algorithms are limited or less efficient, but because of a fundamental assumption about the causal factors that we will discuss. Naive Bayes are multi-purpose classifiers and it's easy to find their application in many different contexts. However, the performance is particularly good in all those situations when the probability of a class is determined by the probabilities of some causal factors. A good example is given by natural language processing, where a text can be considered as a particular instance of a dictionary and the relative frequencies of all terms provide enough information to infer a belonging class. Our examples may be generic, so to let you understand the application of naive Bayes in various context. The Bayes' theorem Let's consider two probabilistic events A and B. We can correlate the marginal probabilities P(A) and P(B) with the conditional probabilities P(A|B) and P(B|A) using the product rule: Considering that the intersection is commutative, the first members are equal, so we can derive the Bayes' theorem: This formula has very deep philosophical implications and it's a fundamental element of statistical learning. First of all, let's consider the marginal probability P(A): this is normally a value that determines how probable a target event is, like P(Spam) or P(Rain). As there are no other elements, this kind of probability is called Apriori, because it's often determined by mathematical considerations or simply by a frequency count. For example, imagine we want to implement a very simple spam filter and we've collected 100 emails. We know that 30 are spam and 70 are regular. So we can say that P(Spam) = 0.3. However, we'd like to evaluate using some criteria (for simplicity, let's consider a single one), for example, e-mail text is shorter than 50 characters. Therefore, our query becomes: The first term is similar to P(Spam) because it's the probability of spam given a certain condition. For this reason, it's called a posteriori (in other words, it's a probability that can estimate after knowing some additional elements). On the right side, we need to calculate the missing values, but it's simple. Let's suppose that 35 emails have a text shorter than 50 characters, P(Text < 50 chars) = 0.35 and, looking only into our spam folder, we discover that only 25 spam emails have a short text, so that P(Text < 50 chars|Spam) = 25/30 = 0.83. The result is: So, after receiving a very short email, there is 71% probability that it's a spam. Now we can understand the role of P(Text < 50 chars|Spam): as we have actual data, we can measure how probable is our hypothesis given the query, in other words, we have defined a likelihood (compare this with logistic regression) which is a weight between the Apriori probability and the a posteriori one (the term on the denominator is less important because it works as normalizing factor): The normalization factor is often represented by the Greek letter alpha, so the formula becomes: The last step is considering the case when there are more concurrent conditions (that is more realistic in real-life problems): A common assumption is called conditional independence (in other words, the effects produced by every cause are independent among each other) and allows us to write a simplified expression: Naive Bayes classifiers A naive Bayes classifier is called in this way because it's based on a naive condition, which implies the conditional independence of causes. This can seem very difficult to accept in many contexts where the probability of a particular feature is strictly correlated to another one. For example, in spam filtering, a text shorter than 50 characters can increase the probability of the presence of an image, or if the domain has been already blacklisted for sending the same spam emails to million users, it's likely to find particular keywords. In other words, the presence of a cause isn't normally independent from the presence of other ones. However, in Zhang H., The Optimality of Naive Bayes, AAAI 1, no. 2 (2004): 3, the author showed that under particular conditions (not so rare to happen), different dependencies clears one another, and a naive Bayes classifier succeeds in achieving very high performances even if its naiveness is violated. Let's consider a dataset: Every feature vector, for simplicity, will be represented as: We need also a target dataset: where each y can belong to one of P different classes. Considering the Bayes' theorem under conditional independence, we can write: The values of the marginal Apriori probability P(y) and of the conditional probabilities P(xi|y) is obtained through a frequency count, therefore, given an input vector x, the predicted class is the one which a posteriori probability is maximum. Naive Bayes in scikit-learn scikit-learn implements three naive Bayes variants based on the same number of different probabilistic distributions: Bernoulli, multinomial, and Gaussian. The first one is a binary distribution useful when a feature can be present or absent. The second one is a discrete distribution used whenever a feature must be represented by a whole number (for example, in natural language processing, it can be the frequency of a term), while the latter is a continuous distribution characterized by its mean and variance. Bernoulli naive Bayes If X is random variable Bernoulli-distributed, it can assume only two values (for simplicity, let's call them 0 and 1) and their probability is: To try this algorithm with scikit-learn, we're going to generate a dummy dataset. Bernoulli naive Bayes expects binary feature vectors, however, the class BernoulliNB has a binarize parameter which allows specifying a threshold that will be used internally to transform the features: from sklearn.datasets import make_classification >>> nb_samples = 300 >>> X, Y = make_classification(n_samples=nb_samples, n_features=2, n_informative=2, n_redundant=0) We have a generated the bidimensional dataset shown in the following figure: We have decided to use 0.0 as a binary threshold, so each point can be characterized by the quadrant where it's located. Of course, this is a rational choice for our dataset, but Bernoulli naive Bayes is thought for binary feature vectors or continuous values which can be precisely split with a predefined threshold. from sklearn.naive_bayes import BernoulliNB from sklearn.model_selection import train_test_split >>> X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.25) >>> bnb = BernoulliNB(binarize=0.0) >>> bnb.fit(X_train, Y_train) >>> bnb.score(X_test, Y_test) 0.85333333333333339 The score in rather good, but if we want to understand how the binary classifier worked, it's useful to see how the data have been internally binarized: Now, checking the naive Bayes predictions we obtain: >>> data = np.array([[0, 0], [0, 1], [1, 0], [1, 1]]) >>> bnb.predict(data) array([0, 0, 1, 1]) Which is exactly what we expected. Multinomial naive Bayes A multinomial distribution is useful to model feature vectors where each value represents, for example, the number of occurrences of a term or its relative frequency. If the feature vectors have n elements and each of them can assume k different values with probability pk, then: The conditional probabilities P(xi|y) are computed with a frequency count (which corresponds to applying a maximum likelihood approach), but in this case, it's important to consider the alpha parameter (called Laplace smoothing factor) which default value is 1.0 and prevents the model from setting null probabilities when the frequency is zero. It's possible to assign all non-negative values, however, larger values will assign higher probabilities to the missing features and this choice could alter the stability of the model. In our example, we're going to consider the default value of 1.0. For our purposes, we're going to use the DictVectorizer. There are automatic instruments to compute the frequencies of terms, but we're going to discuss them later. Let's consider only two records: the first one representing a city, while the second one countryside. Our dictionary contains hypothetical frequencies, like if the terms were extracted from a text description: from sklearn.feature_extraction import DictVectorizer >>> data = [ {'house': 100, 'street': 50, 'shop': 25, 'car': 100, 'tree': 20}, {'house': 5, 'street': 5, 'shop': 0, 'car': 10, 'tree': 500, 'river': 1} ] >>> dv = DictVectorizer(sparse=False) >>> X = dv.fit_transform(data) >>> Y = np.array([1, 0]) >>> X array([[ 100., 100., 0., 25., 50., 20.], [ 10., 5., 1., 0., 5., 500.]]) Note that the term 'river' is missing from the first set, so it's useful to keep alpha equal to 1.0 to give it a small probability. The output classes are 1 for city and 0 for the countryside. Now we can train a MultinomialNB instance: from sklearn.naive_bayes import MultinomialNB >>> mnb = MultinomialNB() >>> mnb.fit(X, Y) MultinomialNB(alpha=1.0, class_prior=None, fit_prior=True) To test the model, we create a dummy city with a river and a dummy country place without any river. >>> test_data = data = [ {'house': 80, 'street': 20, 'shop': 15, 'car': 70, 'tree': 10, 'river': 1}, ] {'house': 10, 'street': 5, 'shop': 1, 'car': 8, 'tree': 300, 'river': 0} >>> mnb.predict(dv.fit_transform(test_data)) array([1, 0]) As expected the prediction is correct. Later on, when discussing some elements of natural language processing, we're going to use multinomial naive Bayes for text classification with larger corpora. Even if the multinomial distribution is based on the number of occurrences, it can be successfully used with frequencies or more complex functions. Gaussian Naive Bayes Gaussian Naive Bayes is useful when working with continuous values which probabilities can be modeled using a Gaussian distribution: The conditional probabilities P(xi|y) are also Gaussian distributed and, therefore, it's necessary to estimate mean and variance of each of them using the maximum likelihood approach. This quite easy, in fact, considering the property of a Gaussian, we get: Where the k index refers to the samples in our dataset and P(xi|y) is a Gaussian itself. By minimizing the inverse of this expression (in Russel S., Norvig P., Artificial Intelligence: A Modern Approach, Pearson there's a complete analytical explanation), we get mean and variance for each Gaussian associated to P(xi|y) and the model is hence trained. As an example, we compare Gaussian Naive Bayes with logistic regression using the ROC curves. The dataset has 300 samples with two features. Each sample belongs to a single class: from sklearn.datasets import make_classification >>> nb_samples = 300 >>> X, Y = make_classification(n_samples=nb_samples, n_features=2, n_informative=2, n_redundant=0) A plot of the dataset is shown in the following figure: Now we can train both models and generate the ROC curves (the Y scores for naive Bayes are obtained through the predict_proba method): from sklearn.naive_bayes import GaussianNB from sklearn.linear_model import LogisticRegression from sklearn.metrics import roc_curve, auc from sklearn.model_selection import train_test_split >>> X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.25) >>> gnb = GaussianNB() >>> gnb.fit(X_train, Y_train) >>> Y_gnb_score = gnb.predict_proba(X_test) >>> lr = LogisticRegression() >>> lr.fit(X_train, Y_train) >>> Y_lr_score = lr.decision_function(X_test) >>> fpr_gnb, tpr_gnb, thresholds_gnb = roc_curve(Y_test, Y_gnb_score[:, 1]) >>> fpr_lr, tpr_lr, thresholds_lr = roc_curve(Y_test, Y_lr_score) The resulting ROC curves are shown in the following figure: Naive Bayes performances are slightly better than logistic regression, however, the two classifiers have similar accuracy and Area Under the Curve (AUC). It's interesting to compare the performances of Gaussian and multinomial naive Bayes with the MNIST digit dataset. Each sample (belonging to 10 classes) is an 8x8 image encoded as an unsigned integer (0 - 255), therefore, even if each feature doesn't represent an actual count, it can be considered like a sort of magnitude or frequency. from sklearn.datasets import load_digits from sklearn.model_selection import cross_val_score >>> digits = load_digits() >>> gnb = GaussianNB() >>> mnb = MultinomialNB() >>> cross_val_score(gnb, digits.data, digits.target, scoring='accuracy', cv=10).mean() 0.81035375835678214 >>> cross_val_score(mnb, digits.data, digits.target, scoring='accuracy', cv=10).mean() 0.88193962163008377 The multinomial naive Bayes performs better than the Gaussian variant and the result is not really surprising. In fact, each sample can be thought as a feature vector derived from a dictionary of 64 symbols. The value can be the count of each occurrence, so a multinomial distribution can better fit the data, while a Gaussian is slightly more limited by its mean and variance. We've exposed the generic naive Bayes approach starting from the Bayes' theorem and its intrinsic philosophy. The naiveness of such algorithm is due to the choice to assume all the causes to be conditional independent. It means that each contribution is the same in every combination and the presence of a specific cause cannot alter the probability of the other ones. This is not so often realistic, however, under some assumptions; it's possible to show that internal dependencies clear each other so that the resulting probability appears unaffected by their relations. [box type="note" align="" class="" width=""]You read an excerpt from the book, Machine Learning Algorithms, written by Giuseppe Bonaccorso. This book will help you build strong foundation to enter the world of machine learning and data science. You will learn to build a data model and see how it behaves using different ML algorithms, explore support vector machines, recommendation systems, and even create a machine learning architecture from scratch. Grab your copy today![/box] What is Naïve Bayes classifier? Machine Learning Algorithms: Implementing Naive Bayes with Spark MLlib Implementing Apache Spark MLlib Naive Bayes to classify digital breath test data for drunk driving  
Read more
  • 0
  • 0
  • 37769

article-image-which-python-framework-is-best-for-building-restful-apis-django-or-flask
Vincy Davis
07 May 2019
9 min read
Save for later

Which Python framework is best for building RESTful APIs? Django or Flask?

Vincy Davis
07 May 2019
9 min read
Python is one of the top-rated programming languages. It's also known for its less-complex syntax, and its high-level, object-oriented, robust, and general-purpose programming. Python is the top choice for any first-time programmer. Since its release in 1991, Python has evolved and powered by several frameworks for web application development, scientific and mathematical computing, and graphical user interfaces to the latest REST API frameworks. This article is an excerpt taken from the book, 'Hands-On RESTful API Design Patterns and Best Practices' written by Harihara Subramanian and Pethura Raj. This book covers design strategy, essential and advanced Restful API Patterns, Legacy Modernization to Microservices centric apps. In this article, we'll explore two comprehensive frameworks, Django and Flask, so that you can choose the best one for developing your RESTful API. Django Django is a web framework also available as open source with the BSD license, designed to help developers create their web app very quickly as it takes care of additional web-development needs. It includes several packages (also known as applications) to handle typical web-development tasks, such as authentication, content administration, scaffolding, templates, caching, and syndication. Let's use the Django REST Framework (DRF) built with Python, and use it for REST API development and deployment. Django Rest Framework DRF is an open source, well-matured Python and Django library intended to help APP developers build sophisticated web APIs. DRF's modular, flexible, and customizable architecture makes the development of both simple, turnkey API endpoints and complicated REST constructs possible. The goal of DRF is to divide a model, generalize the wire representation, such as JSON or XML, and customize a set of class-based views to satisfy the specific API endpoint using a serializer that describes the mapping between views and API endpoints. Core features Django has many distinct features including: Web-browsable API This feature enhances the REST API developed with DRF. It has a rich interface, and the web-browsable API supports multiple media types too. The browsable API does mean that the APIs we build will be self-describing and the API endpoints that we create as part of the REST services and return JSON or HTML representations. The interesting fact about the web-browsable API is that we can interact with it fully through the browser, and any endpoint that we interact with using a programmatic client will also be capable of responding with a browser-friendly view onto the web-browsable API. Authentication One of the main attractive features of Django is authentication; it supports broad categories of authentication schemes, from basic authentication, token authentication, session authentication, remote user authentication, to OAuth Authentication. It also supports custom authentication schemes if we wish to implement one. DRF runs the authentication scheme at the start of the view, that is, before any other code is allowed to proceed. DRF determines the privileges of the incoming request from the permission and throttling policies and then decides whether the incoming request can be allowed or disallowed with the matched credentials. Serialization and deserialization Serialization is the process of converting complex data, such as querysets and model instances, into native Python datatypes. Converting facilitates the rendering of native data types, such as JSON or XML. DRF supports serialization through serializers classes. The serializers of DRF are similar to Django's Form and ModelForm classes. It provides a serializer class, which helps to control the output of responses. The DRF ModelSerializer classes provide a simple mechanism with which we can create serializers that deal with model instances and querysets. Serializers also do deserialization, that is, serializers allow parsed data that needs to be converted back into complex types. Also, deserialization happens only after validating the incoming data. Other noteworthy features Here are some other noteworthy features of the DRF: Routers: The DRF supports automatic URL routing to Django and provides a consistent and straightforward way to wire the view logic to a set of URLs Class-based views: A dominant pattern that enables the reusability of common functionalities Hyperlinking APIs: The DRF supports various styles (using primary keys, hyperlinking between entities, and so on) to represent the relationship between entities Generic views: Allows us to build API views that map to the database models DRF has many other features such as caching, throttling, testing, etc. Benefits of the DRF Here are some of the benefits of the DRF: Web-browsable API Authentication policies Powerful serialization Extensive documentation and excellent community support Simple yet powerful Test coverage of source code Secure and scalable Customizable Drawbacks of the DRF Here are some facts that may disappoint some Python app developers who intend to use the DRF: Monolithic and components get deployed together Based on Django ORM Steep learning curve Slow response time Flask Flask is a microframework for Python developers based on Werkzeug (WSGI toolkit) and Jinja 2 (template engine). It comes under BSD licensing. Flask is very easy to set up and simple to use. Like other frameworks, it comes with several out-of-the-box capabilities, such as a built-in development server, debugger, unit test support, templating, secure cookies, and RESTful request dispatching. The powerful Flask  RESTful API framework is discussed below. Flask-RESTful Flask-RESTful is an extension for Flask that provides additional support for building REST APIs. You will never be disappointed with the time it takes to develop an API. Flask-Restful is a lightweight abstraction that works with the existing ORM/libraries. Flask-RESTful encourages best practices with minimal setup. Core features of Flask-RESTful Flask-RESTful comes with several built-in features. Django and Flask have many common RESTful frameworks, because they have almost the same supporting core features. The unique RESTful features of Flask is mentioned below. Resourceful routing The design goal of Flask-RESTful is to provide resources built on top of Flask pluggable views. The pluggable views provide a simple way to access the HTTP methods. Consider the following example code: class Todo(Resource): def get(self, user_id): .... def delete(self, user_id): .... def put(self, user_id): args = parser.parse_args() .... Restful request parsing Request parsing refers to an interface, modeled after the Python parser interface for command-line arguments, called argparser. The RESTful request parser is designed to provide uniform and straightforward access to any variable that comes within the (flask.request) request object. Output fields In most cases, app developers prefer to control rendering response data, and Flask-RESTful provides a mechanism where you can use ORM models or even custom classes as an object to render. Another interesting fact about this framework is that app developers don't need to worry about exposing any internal data structures as its let one format and filter the response objects. So, when we look at the code, it'll be evident which data would go for rendering and how it'll be formatted. Other noteworthy features Here are some other noteworthy features of Flask-RESTful: API: This is the main entry point for the restful API, which we'll initialize with the Flask application. ReqParse: This enables us to add and parse multiple arguments in the context of the single request. Input: A useful functionality, it parses the input string and returns true or false depending on the Input. If the input is from the JSON body,  the type is already native Boolean and passed through without further parsing. Benefits of the Flask framework Here are some of the benefits of Flask framework: Built-in development server and debugger Out-of-the-box RESTful request dispatching Support for secure cookies Integrated unit-test support Lightweight Very minimal setup Faster (performance) Easy NoSQL integration Extensive documentation Drawbacks of Flask Here are some of Flask and Flask-RESTful's disadvantages: Version management (managed by developers) No brownie points as it doesn't have browsable APIs May incur a steep learning curve Frameworks – a table of reference The following table provides a quick reference of a few other prominent micro-frameworks, their features, and supported programming languages: Language Framework Short description Prominent features Java Blade Fast and elegant MVC framework for Java8 Lightweight High performance Based on the MVC pattern RESTful-style router interface Built-in security Java/Scala Play Framework High-velocity Reactive web framework for Java and Scala Lightweight, stateless, and web-friendly architecture Built on Akka Supports predictable and minimal resource-consumption for highly-scalable applications Developer-friendly Java Ninja Web Framework Full-stack web framework Fast Developer-friendly Rapid prototyping Plain vanilla Java, dependency injection, first-class IDE integration Simple and fast to test (mocked tests/integration tests) Excellent build and CI support Clean codebase – easy to extend Java RESTEASY JBoss-based implementation that integrates several frameworks to help to build RESTful Web and Java applications Fast and reliable Large community Enterprise-ready Security support Java RESTLET A lightweight and comprehensive framework based on Java, suitable for both server and client applications. Lightweight Large community Native REST support Connectors set JavaScript Express.js Minimal and flexible Node.js-based JavaScript framework for mobile and web applications HTTP utility methods Security updates Templating engine PHP Laravel An open source web-app builder based on PHP and the MVC architecture pattern Intuitive interface Blade template engine Eloquent ORM as default Elixir Phoenix (Elixir) Powered with the Elixir functional language, a reliable and faster micro-framework MVC-based High application performance Erlong virtual machine enables better use of resources Python Pyramid Python-based micro-framework Lightweight Function decorators Events and subscribers support Easy implementations and high productivity Summary It's evident that Python has two excellent frameworks. Depending on the choice of programming language you are intending to use and the required features, you can choose your type of framework to work on. If you are interested in learning more about the design strategy, guidelines and best practices of Restful API Patterns, you can refer to our book 'Hands-On RESTful API Design Patterns and Best Practices' here. Stack Overflow survey data further confirms Python’s popularity as it moves above Java in the most used programming language list. Svelte 3 releases with reactivity through language instead of an API Microsoft introduces Pyright, a static type checker for the Python language written in TypeScript
Read more
  • 0
  • 0
  • 37736
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime
article-image-statistical-tools-in-wireshark-for-packet-analysis
Vijin Boricha
06 Aug 2018
9 min read
Save for later

Using statistical tools in Wireshark for packet analysis [Tutorial]

Vijin Boricha
06 Aug 2018
9 min read
One of Wireshark's strengths is its statistical tools. When using Wireshark, we have various types of tools, starting from the simple tools for listing end-nodes and conversations, to the more sophisticated tools such as flow and I/O graphs. In this article, we will look at the simple tools in Wireshark that provide us with basic network statistics i.e; who talks to whom over the network, what are the chatty devices, what packet sizes run over the network, and so on. To start statistics tools, start Wireshark, and choose Statistics from the main menu. This article is an excerpt from Network Analysis using Wireshark 2 Cookbook - Second Edition written by Nagendra Kumar Nainar, Yogesh Ramdoss, Yoram Orzach. Using the statistics for capture file properties menu In this recipe, we will learn how to get general information from the data that runs over the network. The capture file properties in Wireshark 2 replaces the summary menu in Wireshark 1. Start Wireshark, click on Statistics. How to do it... From the Statistics menu, choose Capture File Properties: What you will get is the Capture File Properties window (displayed in the following screenshot). As you can see in the following screenshot, we have the following: File: Provides file data, such as filename and path, length, and so on Time: Start time, end time, and duration of capture Capture: Hardware information for the PC that Wireshark is installed on Interfaces: Interface information—the interface registry identifier on the left, if capture filter is turned on, interface type and packet size limit Statistics: General capture statistics, including captured and displayed packets: How it works... This menu simply gives a summary of the filtered data properties and the capture statistics (average packets or bytes per second) when someone wants to learn the capture statistics. Using the statistics for protocol hierarchy menu In this recipe, we will learn how to get protocol hierarchy information of the data that runs over the network. Start Wireshark, click on Statistics. How to do it... From the Statistics menu, choose Protocol Hierarchy: What you will get is data about the protocol distribution in the captured file. You will get the protocol distribution of the captured data. The partial screenshot displayed here depicts the statistics of packets captured on a per-protocol basis: What you will get is the Protocol Hierarchy window: Protocol: The protocol name Percent Packets: The percentage of protocol packets from the total captured packets Packets: The number of protocol packets from the total captured packets Percent Bytes: The percentage of protocol bytes from the total captured packets Bytes: The number of protocol bytes from the total captured packets Bit/s: The bandwidth of this protocol, in relation to the capture time End Packets: The absolute number of packets of this protocol (for the highest protocol in the decode file) End Bytes: The absolute number of bytes of this protocol (for the highest protocol in the decode file) End Bit/s: The bandwidth of this protocol, relative to the capture packets and time (for the highest protocol in the decode file) The end columns counts when the protocol is the last protocol in the packet (that is, when the protocol comes at the end of the frame). These can be TCP packets with no payload (for example, SYN packets) which carry upper layer protocols. That is why you see a zero count for Ethernet, IPv4, and UDP end packets; there are no frames where those protocols are the last protocol in the frame. In this file example, we can see two interesting issues: We can see 1,842 packets of DHCPv6. If IPv6 and DHCPv6 are not required, disable it. We see more than 200,000 checkpoint high availability (CPHA) packets, 74.7% of which are sent over the network we monitored. These are synchronization packets that are sent between two firewalls working in a cluster, updating session tables between the firewalls. Such an amount of packets can severely influence performance. The solution for this problem is to configure a dedicated link between the firewalls so that session tables will not influence the network. How it works... Simply, it calculates statistics over the captured data. Some important things to notice: The percentage always refers to the same layer protocols. For example, in the following screenshot, we see that logical link control has 0.5% of the packets that run over Ethernet, IPv6 has 1.0%, IPv4 has 88.8% of the packets, ARP has 9.6% of the packets and even the old Cisco ISK has 0.1 %—a total of 100 % of the protocols over layer 2 Ethernet. On the other hand, we see that TCP has 75.70% of the data, and inside TCP, only 12.74% of the packets are HTTP, and that is almost it. This is because Wireshark counts only the packets with the HTTP headers. It doesn't count, for example, the ACK packets, data packets, and so on: Using the statistics for conversations menu In this recipe, we will learn how to get conversation information of the data that runs over the network. Start Wireshark, click on Statistics. How to do it... From the Statistics menu, choose Conversations: The following window will come up: You can choose between layer 2 Ethernet statistics, layer 3 IP statistics, or layer 4 TCP or UDP statistics. You can use this statistics tools for: On layer 2 (Ethernet): To find and isolate broadcast storms On layer 3/layer 4 (TCP/IP): To connect in parallel to the internet router port, and check who is loading the line to the ISP If you see that there is a lot of traffic going out to port 80 (HTTP) on a specific IP address on the internet, you just have to copy the address to your browser and find the website that is most popular with your users. If you don't get anything, simply go to a standard DNS resolution website (search Google for DNS lookup) and find out what is loading your internet line. For viewing IP addresses as names, you can check the Name resolution checkbox for name resolution (1 in the previous screenshot). For seeing the name resolution, you will first have to enable it by choosing View | Name Resolution | Enable for Network layer. You can also limit the conversations statistics to a display filter by checking the Limit to display filter checkbox (2). In this way, statistics will be presented on all the packets passing the display filter. A new feature in Wireshark version 2 is the graph feature, marked as (5) in the previous screenshot. When you choose a specific line in the TCP conversations statistics and click Graph..., it brings you to the TCP time/sequence (tcptrace) stream graph. To copy table data, click on the Copy button (3). In TCP or UDP, you can mark a specific line, and then click on the Follow Stream... button (4). This will define a display filter that will show you the specific stream of data. As you can see in the following screenshot, you can also right-click a line and choose to prepare or apply a filter, or to colorize a data stream: We also see that, unlike the previous Wireshark version, in which we saw all types of protocols in the upper tabs, here we can choose which protocols to see when only the identified protocols are presented by default. How it works... A network conversation is the traffic between two specific endpoints. For example, an IP conversation is all the traffic between two IP addresses, and TCP conversations present all TCP connections. Using the statistics for endpoints menu In this recipe, we will learn how to get endpoint statistics information of the captured data. Start Wireshark and click on Statistics. How to do it... To view the endpoint statistics, follow these steps: From the Statistics menu, choose Endpoints: The following window will come up: In this window, you will be able to see layer 2, 3, and 4 endpoints, which is Ethernet, IP, and TCP or UDP. From the left-hand side of the window you can see (here is an example for the TCP tab): Endpoint IP address and port number on this host Total packets sent, and bytes received from and to this host Packets to the host (Packets A → B) and bytes to host (Bytes A → B) Packets to the host (Packets B → A) and bytes to host (Bytes B → A) The Latitude and Longitude columns applicable with the GeoIP configured At the bottom of the window we have the following checkboxes: Name resolution: Provide name resolution in cases where it is configured in the name resolution under the view menu. Limit to display filter: To show statistics only for the display filter configured on the main window. Copy: Copy the list values to the clipboard in CSV or YAML format. Map: In cases where GeoIP is configured, shows the geographic information on the geographical map. How it works... Quite simply, it gives statistics on all the endpoints Wireshark has discovered. It can be any situation, such as the following: Few Ethernet (even on) end nodes (that is, MAC addresses), with many IP end nodes (that is, IP addresses)—this will be the case where, for example, we have a router that sends/receives packets from many remote devices. Few IP end nodes with many TCP end nodes—this will be the case for many TCP connections per host. Can be a regular operation of a server with many connections, and it could also be a kind of attack that comes through the network (SYN attack). We learned about Wireshark's basic statistic tools and how you can leverage those for network analysis. Get over 100 recipes to analyze and troubleshoot network problems using Wireshark 2 from this book Network Analysis using Wireshark 2 Cookbook - Second Edition. What’s new in Wireshark 2.6 ? Wireshark for analyzing issues & malicious emails in POP, IMAP, and SMTP  [Tutorial] Capturing Wireshark Packets
Read more
  • 0
  • 5
  • 36554

article-image-building-a-twitter-news-bot-using-twitter-api-tutorial
Bhagyashree R
07 Sep 2018
11 min read
Save for later

Building a Twitter news bot using Twitter API [Tutorial]

Bhagyashree R
07 Sep 2018
11 min read
This article is an excerpt from a book written by Srini Janarthanam titled Hands-On Chatbots and Conversational UI Development. In this article, we will explore the Twitter API and build core modules for tweeting, searching, and retweeting. We will further explore a data source for news around the globe and build a simple bot that tweets top news on its timeline. Getting started with the Twitter app To get started, let us explore the Twitter developer platform. Let us begin by building a Twitter app and later explore how we can tweet news articles to followers based on their interests: Log on to Twitter. If you don't have an account on Twitter, create one. Go to Twitter Apps, which is Twitter's application management dashboard. Click the Create New App button: Create an application by filling in the form providing name, description, and a website (fully-qualified URL). Read and agree to the Developer Agreement and hit Create your Twitter application: You will now see your application dashboard. Explore the tabs: Click Keys and Access Tokens: Copy consumer key and consumer secret and hang on to them. Scroll down to Your Access Token: Click Create my access token to create a new token for your app: Copy the Access Token and Access Token Secret and hang on to them. Now, we have all the keys and tokens we need to create a Twitter app. Building your first Twitter bot Let's build a simple Twitter bot. This bot will listen to tweets and pick out those that have a particular hashtag. All the tweets with a given hashtag will be printed on the console. This is a very simple bot to help us get started. In the following sections, we will explore more complex bots. To follow along you can download the code from the book's GitHub repository. Go to the root directory and create a new Node.js program using npm init: Execute the npm install twitter --save command to install the Twitter Node.js library: Run npm install request --save to install the Request library as well. We will use this in the future to make HTTP GET requests to a news data source. Explore your package.json file in the root directory: { "name": "twitterbot", "version": "1.0.0", "description": "my news bot", "main": "index.js", "scripts": { "test": "echo \"Error: no test specified\" && exit 1" }, "author": "", "license": "ISC", "dependencies": { "request": "^2.81.0", "twitter": "^1.7.1" } } Create an index.js file with the following code: //index.js var TwitterPackage = require('twitter'); var request = require('request'); console.log("Hello World! I am a twitter bot!"); var secret = { consumer_key: 'YOUR_CONSUMER_KEY', consumer_secret: 'YOUR_CONSUMER_SECRET', access_token_key: 'YOUR_ACCESS_TOKEN_KEY', access_token_secret: 'YOUR_ACCESS_TOKEN_SECRET' } var Twitter = new TwitterPackage(secret); In the preceding code, put the keys and tokens you saved in their appropriate variables. We don't need the request package just yet, but we will later. Now let's create a hashtag listener to listen to the tweets on a specific hashtag: //Twitter stream var hashtag = '#brexit'; //put any hashtag to listen e.g. #brexit console.log('Listening to:' + hashtag); Twitter.stream('statuses/filter', {track: hashtag}, function(stream) { stream.on('data', function(tweet) { console.log('Tweet:@' + tweet.user.screen_name + '\t' + tweet.text); console.log('------') }); stream.on('error', function(error) { console.log(error); }); }); Replace #brexit with the hashtag you want to listen to. Use a popular one so that you can see the code in action. Run the index.js file with the node index.js command. You will see a stream of tweets from Twitter users all over the globe who used the hashtag: Congratulations! You have built your first Twitter bot. Exploring the Twitter SDK In the previous section, we explored how to listen to tweets based on hashtags. Let's now explore the Twitter SDK to understand the capabilities that we can bestow upon our Twitter bot. Updating your status You can also update your status on your Twitter timeline by using the following status update module code: tweet ('I am a Twitter Bot!', null, null); function tweet(statusMsg, screen_name, status_id){ console.log('Sending tweet to: ' + screen_name); console.log('In response to:' + status_id); var msg = statusMsg; if (screen_name != null){ msg = '@' + screen_name + ' ' + statusMsg; } console.log('Tweet:' + msg); Twitter.post('statuses/update', { status: msg }, function(err, response) { // if there was an error while tweeting if (err) { console.log('Something went wrong while TWEETING...'); console.log(err); } else if (response) { console.log('Tweeted!!!'); console.log(response) } }); } Comment out the hashtag listener code and instead add the preceding status update code and run it. When run, your bot will post a tweet on your timeline: In addition to tweeting on your timeline, you can also tweet in response to another tweet (or status update). The screen_name argument is used to create a response. tweet. screen_name is the name of the user who posted the tweet. We will explore this a bit later. Retweet to your followers You can retweet a tweet to your followers using the following retweet status code: var retweetId = '899681279343570944'; retweet(retweetId); function retweet(retweetId){ Twitter.post('statuses/retweet/', { id: retweetId }, function(err, response) { if (err) { console.log('Something went wrong while RETWEETING...'); console.log(err); } else if (response) { console.log('Retweeted!!!'); console.log(response) } }); } Searching for tweets You can also search for recent or popular tweets with hashtags using the following search hashtags code: search('#brexit', 'popular') function search(hashtag, resultType){ var params = { q: hashtag, // REQUIRED result_type: resultType, lang: 'en' } Twitter.get('search/tweets', params, function(err, data) { if (!err) { console.log('Found tweets: ' + data.statuses.length); console.log('First one: ' + data.statuses[1].text); } else { console.log('Something went wrong while SEARCHING...'); } }); } Exploring a news data service Let's now build a bot that will tweet news articles to its followers at regular intervals. We will then extend it to be personalized by users through a conversation that happens over direct messaging with the bot. In order to build a news bot, we need a source where we can get news articles. We are going to explore a news service called NewsAPI.org in this section. News API is a service that aggregates news articles from roughly 70 newspapers around the globe. Setting up News API Let us set up an account with the News API data service and get the API key: Go to NewsAPI.org: Click Get API key. Register using your email. Get your API key. Explore the sources: https://newsapi.org/v1/sources?apiKey=YOUR_API_KEY. There are about 70 sources from across the globe including popular ones such as BBC News, Associated Press, Bloomberg, and CNN. You might notice that each source has a category tag attached. The possible options are: business, entertainment, gaming, general, music, politics, science-and-nature, sport, and technology. You might also notice that each source also has language (en, de, fr) and country (au, de, gb, in, it, us) tags. The following is the information on the BBC-News source: { "id": "bbc-news", "name": "BBC News", "description": "Use BBC News for up-to-the-minute news, breaking news, video, audio and feature stories. BBC News provides trusted World and UK news as well as local and regional perspectives. Also entertainment, business, science, technology and health news.", "url": "http://www.bbc.co.uk/news", "category": "general", "language": "en", "country": "gb", "urlsToLogos": { "small": "", "medium": "", "large": "" }, "sortBysAvailable": [ "top" ] } Get sources for a specific category, language, or country using: https://newsapi.org/v1/sources?category=business&apiKey=YOUR_API_KEY The following is the part of the response to the preceding query asking for all sources under the business category: "sources": [ { "id": "bloomberg", "name": "Bloomberg", "description": "Bloomberg delivers business and markets news, data, analysis, and video to the world, featuring stories from Businessweek and Bloomberg News.", "url": "http://www.bloomberg.com", "category": "business", "language": "en", "country": "us", "urlsToLogos": { "small": "", "medium": "", "large": "" }, "sortBysAvailable": [ "top" ] }, { "id": "business-insider", "name": "Business Insider", "description": "Business Insider is a fast-growing business site with deep financial, media, tech, and other industry verticals. Launched in 2007, the site is now the largest business news site on the web.", "url": "http://www.businessinsider.com", "category": "business", "language": "en", "country": "us", "urlsToLogos": { "small": "", "medium": "", "large": "" }, "sortBysAvailable": [ "top", "latest" ] }, ... ] Explore the articles: https://newsapi.org/v1/articles?source=bbc-news&apiKey=YOUR_API_KEY The following is the sample response: "articles": [ { "author": "BBC News", "title": "US Navy collision: Remains found in hunt for missing sailors", "description": "Ten US sailors have been missing since Monday's collision with a tanker near Singapore.", "url": "http://www.bbc.co.uk/news/world-us-canada-41013686", "urlToImage": "https://ichef1.bbci.co.uk/news/1024/cpsprodpb/80D9/ production/_97458923_mediaitem97458918.jpg", "publishedAt": "2017-08-22T12:23:56Z" }, { "author": "BBC News", "title": "Afghanistan hails Trump support in 'joint struggle'", "description": "President Ghani thanks Donald Trump for supporting Afghanistan's battle against the Taliban.", "url": "http://www.bbc.co.uk/news/world-asia-41012617", "urlToImage": "https://ichef.bbci.co.uk/images/ic/1024x576/p05d08pf.jpg", "publishedAt": "2017-08-22T11:45:49Z" }, ... ] For each article, the author, title, description, url, urlToImage,, and publishedAt fields are provided. Now that we have explored a source of news data that provides up-to-date news stories under various categories, let us go on to build a news bot. Building a Twitter news bot Now that we have explored News API, a data source for the latest news updates, and a little bit of what the Twitter API can do, let us combine them both to build a bot tweeting interesting news stories, first on its own timeline and then specifically to each of its followers: Let's build a news tweeter module that tweets the top news article given the source. The following code uses the tweet() function we built earlier: topNewsTweeter('cnn', null); function topNewsTweeter(newsSource, screen_name, status_id){ request({ url: 'https://newsapi.org/v1/articles?source=' + newsSource + '&apiKey=YOUR_API_KEY', method: 'GET' }, function (error, response, body) { //response is from the bot if (!error && response.statusCode == 200) { var botResponse = JSON.parse(body); console.log(botResponse); tweetTopArticle(botResponse.articles, screen_name); } else { console.log('Sorry. No new'); } }); } function tweetTopArticle(articles, screen_name, status_id){ var article = articles[0]; tweet(article.title + " " + article.url, screen_name); } Run the preceding program to fetch news from CNN and post the topmost article on Twitter: Here is the post on Twitter: Now, let us build a module that tweets news stories from a randomly-chosen source in a list of sources: function tweetFromRandomSource(sources, screen_name, status_id){ var max = sources.length; var randomSource = sources[Math.floor(Math.random() * (max + 1))]; //topNewsTweeter(randomSource, screen_name, status_id); } Let's call the tweeting module after we acquire the list of sources: function getAllSourcesAndTweet(){ var sources = []; console.log('getting sources...') request({ url: 'https://newsapi.org/v1/sources? apiKey=YOUR_API_KEY', method: 'GET' }, function (error, response, body) { //response is from the bot if (!error && response.statusCode == 200) { // Print out the response body var botResponse = JSON.parse(body); for (var i = 0; i < botResponse.sources.length; i++){ console.log('adding.. ' + botResponse.sources[i].id) sources.push(botResponse.sources[i].id) } tweetFromRandomSource(sources, null, null); } else { console.log('Sorry. No news sources!'); } }); } Let's create a new JS file called tweeter.js. In the tweeter.js file, call getSourcesAndTweet() to get the process started: //tweeter.js var TwitterPackage = require('twitter'); var request = require('request'); console.log("Hello World! I am a twitter bot!"); var secret = { consumer_key: 'YOUR_CONSUMER_KEY', consumer_secret: 'YOUR_CONSUMER_SECRET', access_token_key: 'YOUR_ACCESS_TOKEN_KEY', access_token_secret: 'YOUR_ACCESS_TOKEN_SECRET' } var Twitter = new TwitterPackage(secret); getAllSourcesAndTweet(); Run the tweeter.js file on the console. This bot will tweet a news story every time it is called. It will choose top news stories from around 70 news sources randomly. Hurray! You have built your very own Twitter news bot. In this tutorial, we have covered a lot. We started off with the Twitter API and got a taste of how we can automatically tweet, retweet, and search for tweets using hashtags. We then explored a News source API that provides news articles from about 70 different newspapers. We integrated it with our Twitter bot to create a new tweeting bot. If you found this post useful, do check out the book, Hands-On Chatbots and Conversational UI Development, which will help you explore the world of conversational user interfaces. Build and train an RNN chatbot using TensorFlow [Tutorial] Building a two-way interactive chatbot with Twilio: A step-by-step guide How to create a conversational assistant or chatbot using Python
Read more
  • 0
  • 1
  • 36104

article-image-debugging-and-profiling-python-scripts-tutorial
Melisha Dsouza
21 Mar 2019
12 min read
Save for later

Debugging and Profiling Python Scripts [Tutorial]

Melisha Dsouza
21 Mar 2019
12 min read
Debugging and profiling play an important role in Python development. The debugger helps programmers to analyze the complete code. The debugger sets the breakpoints whereas the profilers run our code and give us the details of the execution time. The profilers will identify the bottlenecks in your programs. In this tutorial, we'll learn about the pdb Python debugger, cProfile module, and timeit module to time the execution of Python code. This tutorial is an excerpt from a book written by Ganesh Sanjiv Naik titled Mastering Python Scripting for System Administrators. This book will show you how to leverage Python for tasks ranging from text processing, network administration, building GUI, web-scraping as well as database administration including data analytics & reporting. Python debugging techniques Debugging is a process that resolves the issues that occur in your code and prevent your software from running properly. In Python, debugging is very easy. The Python debugger sets conditional breakpoints and debugs the source code one line at a time. We'll debug our Python scripts using a pdb module that's present in the Python standard library. To better debug a Python program, various techniques are available. We're going to look at four techniques for Python debugging: print() statement: This is the simplest way of knowing what's exactly happening so you can check what has been executed. logging: This is like a print statement but with more contextual information so you can understand it fully. pdb debugger: This is a commonly used debugging technique. The advantage of using pdb is that you can use pdb from the command line, within an interpreter, and within a program. IDE debugger: IDE has an integrated debugger. It allows developers to execute their code and then the developer can inspect while the program executes. Error handling (exception handling) In this section, we're going to learn how Python handles exceptions. An exception is an error that occurs during program execution. Whenever any error occurs, Python generates an exception that will be handled using a try…except block. Some exceptions can't be handled by programs so they result in error messages. Now, we are going to see some exception examples. In your Terminal, start the python3 interactive console and we will see some exception examples: student@ubuntu:~$ python3 Python 3.5.2 (default, Nov 23 2017, 16:37:01) [GCC 5.4.0 20160609] on linux Type "help", "copyright", "credits" or "license" for more information. >>> >>> 50 / 0 Traceback (most recent call last): File "<stdin>", line 1, in <module> ZeroDivisionError: division by zero >>> >>> 6 + abc*5 Traceback (most recent call last): File "<stdin>", line 1, in <module> NameError: name 'abc' is not defined >>> >>> 'abc' + 2 Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: Can't convert 'int' object to str implicitly >>> >>> import abcd Traceback (most recent call last): File "<stdin>", line 1, in <module> ImportError: No module named 'abcd' >>> These are some examples of exceptions. Now, we will see how we can handle the exceptions. Whenever errors occur in your Python program, exceptions are raised. We can also forcefully raise an exception using raise keyword. Now we are going to see a try…except block that handles an exception. In the try block, we will write a code that may generate an exception. In the except block, we will write a solution for that exception. The syntax for try…except is as follows: try: statement(s) except: statement(s) A try block can have multiple except statements. We can handle specific exceptions also by entering the exception name after the except keyword. The syntax for handling a specific exception is as follows: try: statement(s) except exception_name: statement(s) We are going to create an exception_example.py script to catch ZeroDivisionError. Write the following code in your script: a = 35 b = 57 try: c = a + b print("The value of c is: ", c) d = b / 0 print("The value of d is: ", d) except: print("Division by zero is not possible") print("Out of try...except block") Run the script as follows and you will get the following output: student@ubuntu:~$ python3 exception_example.py The value of c is: 92 Division by zero is not possible Out of try...except block Debuggers tools There are many debugging tools supported in Python: winpdb pydev pydb pdb gdb pyDebug In this section, we are going to learn about pdb Python debugger. pdb module is a part of Python's standard library and is always available to use. The pdb debugger The pdb module is used to debug Python programs. Python programs use pdb interactive source code debugger to debug the programs. pdb sets breakpoints and inspects the stack frames, and lists the source code. Now we will learn about how we can use the pdb debugger. There are three ways to use this debugger: Within an interpreter From a command line Within a Python script We are going to create a pdb_example.py script and add the following content in that script: class Student: def __init__(self, std): self.count = std def print_std(self): for i in range(self.count): print(i) return if __name__ == '__main__': Student(5).print_std() Using this script as an example to learn Python debugging, we will see how we can start the debugger in detail. Within an interpreter To start the debugger from the Python interactive console, we are using run() or runeval(). Start your python3 interactive console. Run the following command to start the console: $ python3 Import our pdb_example script name and the pdb module. Now, we are going to use run() and we are passing a string expression as an argument to run() that will be evaluated by the Python interpreter itself: student@ubuntu:~$ python3 Python 3.5.2 (default, Nov 23 2017, 16:37:01) [GCC 5.4.0 20160609] on linux Type "help", "copyright", "credits" or "license" for more information. >>> >>> import pdb_example >>> import pdb >>> pdb.run('pdb_example.Student(5).print_std()') > <string>(1)<module>() (Pdb) To continue debugging, enter continue after the (Pdb) prompt and press Enter. If you want to know the options we can use in this, then after the (Pdb) prompt press the Tab key twice. Now, after entering continue, we will get the output as follows: student@ubuntu:~$ python3 Python 3.5.2 (default, Nov 23 2017, 16:37:01) [GCC 5.4.0 20160609] on linux Type "help", "copyright", "credits" or "license" for more information. >>> >>> import pdb_example >>> import pdb >>> pdb.run('pdb_example.Student(5).print_std()') > <string>(1)<module>() (Pdb) continue 0 1 2 3 4 >>> From a command line The simplest and most straightforward way to run a debugger is from a command line. Our program will act as input to the debugger. You can use the debugger from command line as follows: $ python3 -m pdb pdb_example.py When you run the debugger from the command line, source code will be loaded and it will stop the execution on the first line it finds. Enter continue to continue the debugging. Here's the output: student@ubuntu:~$ python3 -m pdb pdb_example.py > /home/student/pdb_example.py(1)<module>() -> class Student: (Pdb) continue 0 1 2 3 4 The program finished and will be restarted > /home/student/pdb_example.py(1)<module>() -> class Student: (Pdb) Within a Python script The previous two techniques will start the debugger at the beginning of a Python program. But this third technique is best for long-running processes. To start the debugger within a script, use set_trace(). Now, modify your pdb_example.py file as follows: import pdb class Student: def __init__(self, std): self.count = std def print_std(self): for i in range(self.count): pdb.set_trace() print(i) return if __name__ == '__main__': Student(5).print_std() Now, run the program as follows: student@ubuntu:~$ python3 pdb_example.py > /home/student/pdb_example.py(10)print_std() -> print(i) (Pdb) continue 0 > /home/student/pdb_example.py(9)print_std() -> pdb.set_trace() (Pdb) set_trace() is a Python function, therefore you can call it at any point in your program. So, these are the three ways by which you can start a debugger. Debugging basic program crashes In this section, we are going to see the trace module. The trace module helps in tracing the program execution. So, whenever your Python program crashes, we can understand where it crashes. We can use trace module by importing it into your script as well as from the command line. Now, we will create a script named trace_example.py and write the following content in the script: class Student: def __init__(self, std): self.count = std def go(self): for i in range(self.count): print(i) return if __name__ == '__main__': Student(5).go() The output will be as follows: student@ubuntu:~$ python3 -m trace --trace trace_example.py --- modulename: trace_example, funcname: <module> trace_example.py(1): class Student: --- modulename: trace_example, funcname: Student trace_example.py(1): class Student: trace_example.py(2): def __init__(self, std): trace_example.py(5): def go(self): trace_example.py(10): if __name__ == '__main__': trace_example.py(11): Student(5).go() --- modulename: trace_example, funcname: init trace_example.py(3): self.count = std --- modulename: trace_example, funcname: go trace_example.py(6): for i in range(self.count): trace_example.py(7): print(i) 0 trace_example.py(6): for i in range(self.count): trace_example.py(7): print(i) 1 trace_example.py(6): for i in range(self.count): trace_example.py(7): print(i) 2 trace_example.py(6): for i in range(self.count): trace_example.py(7): print(i) 3 trace_example.py(6): for i in range(self.count): trace_example.py(7): print(i) 4 So, by using trace --trace at the command line, the developer can trace the program line-by-line. So, whenever the program crashes, the developer will know the instance where it crashes. Profiling and timing programs Profiling a Python program means measuring an execution time of a program. It measures the time spent in each function. Python's cProfile module is used for profiling a Python program. The cProfile module As discussed previously, profiling means measuring the execution time of a program. We are going to use the cProfile Python module for profiling a program. Now, we will write a cprof_example.py script and write the following code in it: mul_value = 0 def mul_numbers( num1, num2 ): mul_value = num1 * num2; print ("Local Value: ", mul_value) return mul_value mul_numbers( 58, 77 ) print ("Global Value: ", mul_value) Run the program and you will see the output as follows: student@ubuntu:~$ python3 -m cProfile cprof_example.py Local Value: 4466 Global Value: 0 6 function calls in 0.000 seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 1 0.000 0.000 0.000 0.000 cprof_example.py:1(<module>) 1 0.000 0.000 0.000 0.000 cprof_example.py:2(mul_numbers) 1 0.000 0.000 0.000 0.000 {built-in method builtins.exec} 2 0.000 0.000 0.000 0.000 {built-in method builtins.print} 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects} So, using cProfile, all functions that are called will get printed with the time spent on each function. Now, we will see what these column headings mean: ncalls: Number of calls tottime: Total time spent in the given function percall: Quotient of tottime divided by ncalls cumtime: Cumulative time spent in this and all subfunctions percall: Quotient of cumtime divided by primitive calls filename:lineno(function): Provides the respective data of each function timeit timeit is a Python module used to time small parts of your Python script. You can call timeit from the command line as well as import the timeit module into your script. We are going to write a script to time a piece of code. Create a timeit_example.py script and write the following content into it: import timeit prg_setup = "from math import sqrt" prg_code = ''' def timeit_example(): list1 = [] for x in range(50): list1.append(sqrt(x)) ''' # timeit statement print(timeit.timeit(setup = prg_setup, stmt = prg_code, number = 10000)) Using timeit, we can decide what piece of code we want to measure the performance of. So, we can easily define the setup code as well as the code snippet on which we want to perform the test separately. The main code runs 1 million times, which is the default time, whereas the setup code runs only once. Making programs run faster There are various ways to make your Python programs run faster, such as the following: Profile your code so you can identify the bottlenecks Use built-in functions and libraries so the interpreter doesn't need to execute loops Avoid using globals as Python is very slow in accessing global variables Use existing packages Summary In this tutorial, we learned about the importance of debugging and profiling programs. We learned what the different techniques available for debugging are. We learned about the pdb Python debugger and how to handle exceptions and how to use the cProfile and timeit modules of Python while profiling and timing our scripts. We also learned how to make your scripts run faster. To learn how to to use the latest features of Python and be able to build powerful tools that will solve challenging, real-world tasks, check out our book Mastering Python Scripting for System Administrators. 5 blog posts that could make you a better Python programmer Using Python Automation to interact with network devices [Tutorial] 4 tips for learning Data Visualization with Python
Read more
  • 0
  • 0
  • 34301

article-image-understanding-network-port-numbers-tcp-udp-and-icmp-on-an-operating-system
Guest Contributor
16 Apr 2019
16 min read
Save for later

Understanding network port numbers, TCP, UDP, and ICMP on an operating system

Guest Contributor
16 Apr 2019
16 min read
As a student, professional or enthusiast who is interested in the field of computer networking, it is quite important to have a firm understanding and the need for logical (internal) ports on an operating system and protocols. This article is an excerpt taken from the book CompTIA Network+ Certification Guide written by Glen D. Singh and Rishi Latchmepersad. This book will help you understand topics like network architecture, security, network monitoring, troubleshooting and much more. This article provides you with an introduction to understanding network port numbers, TCP, UDP, and ICMP. The term “ports” or “network ports” usually means the physical interfaces or ports on a device, such as a router, switch, server or even a personal computer. However, even though these are the physical ports, there are also logical ports within an operating system or a device. You may ask yourself, how does a physical port exist within a computer, server or a network appliance such as a router or switch? Here, we are going to further breakdown the concepts of these logical ports or what is known as network ports. To get started, we will use a simple analogy to help you understand the fundamentals of logical ports on a system. Let’s imagine you own an organization, at the headquarters location, is a single building with many floors and at the center of the building are the elevators for easy access to the upper floors. Each floor is occupied by a unique department and its respective staff members of the organization. Each day, the employees use the elevators which transport the staff to his/her relevant department and back. Let’s imagine the physical building is a computing system such as a server, there are doors at each relevant department and the employees of the organization are different types of network traffic entering and leaving the system on a daily basis. Now let’s put all the piece together and get everything working in harmony. Each time an employee (network traffic) enters the building (operating system), he/she takes the elevator (Transport Layer) which delivers the employee to their respective doorway (logical port) at their department (service/protocol at the Application Layer). From this analogy, you may have realized each type of network traffic (employee) enters their relevant department using a doorway, this doorway is a logical port existing within the operating system (building) and won’t be visible to any entity outside of the system. Each type of network traffic is sent to a specific logical port for further processing before it’s delivered to the Application Layer. The Internet Assigned Numbers Authority (IANA) is the governing body who manages and regulates Internet Protocol (IP) addresses and Port Numbers assignments.  According to the Service Name and Transport Protocol Port Number Registry of IANA, there are a total of 65,535 ports. Each of which is either Transmission Control Protocol (TCP) or User Datagram Protocol (UDP) port types, there are some ports which are both TCP and UDP types. The ranges of the ports are categorized into three simple categories for easy identification: [box type="shadow" align="" class="" width=""]Get further information on the assignments of port by Internet Assigned Numbers Authority (IANA) on its official website.[/box] Internet Engineering Task Force (IETF) defines the procedures for managing the service names and port numbers by RFC 6335. Now we have a clear understanding of the roles of ports on a system, let’s dive a bit deeper in define some of the well-known ports and their purposes on a network. Network Protocols and their Port numbers A network protocol defines the rules and procedures in which data communication occurs between devices over a network. Without predefined rules or procedures, the messages traversing a network would be without any particular formatting and may not be meaningful to the receipt device. To further discuss the importance of have protocols on a network/system, we will use the following analogy to provide you with a real-world situation in comparison to network protocols. Let’s imagine you work for an organization, ACME Corp and within the company, there are many policies and procedures that govern the handling of day to day transactions and activities within the organization. One of the most important procedure is the emergency evacuation plan. If there’s an emergency with the organization, the procedure documents the rules and guidelines each employee must follow to ensure they are escorted safely out the compound unto the muster point while the health and safety officers conduct their checks before allowing anyone to re-enter the compound. If proper procedures and guidelines didn’t exist within ACME Corp, persons would be attempt exist the compound in a haphazard behavior which may result in further safety issues. With procedures and guidelines, the employees evacuate in a systematic manner. This is the same concept which is applied on the network. There are many different protocols which use a network to communicate with another device. Each protocol has their own uniqueness in which the information is formatted, the rules and procedures it follows while traveling on the network until it is received by the intending receipt and process upwards on the Open Systems Interconnection (OSI) reference model or the Transmission Control Protocol/Internet Protocol (TCP/IP) stack. [box type="shadow" align="" class="" width=""]The ISO Open Systems Interconnection (OSI) is simply a reference model and it not actually implemented on a system, however, network professionals use this model mostly during network and security discussions and troubleshooting concepts. The Transmission Control Protocol/Internet Protocol (TCP/IP) stack is implemented in all network related devices.[/box] Now you have understood the concepts of network protocols, let’s discuss some of the popular protocols and their respective port numbers and their importance on a network. Protocol Types Internet Control Message Protocol (ICMP) On a network, whether on a Local Area Network (LAN) or a Wide Area Network (WAN), host devices will be communicating to exchange data and information between each other and sometimes an error can occur. Let’s imagine you are sending a packet to a server on the internet, while your computer is initializing the connection between itself and the remote server, it provides an error stating unable to connect. As an upcoming networking professional, you may wonder why both devices are unable to successfully establish a connection amongst themselves. Internet Control Message Protocol (ICMP) defined by RFC 792 is typically used to provide error reporting on a network. There are many types of Internet Control Message Protocol (ICMP) messages which provide different actions and give feedback if an error occurs, and also the issue which exists. Internet Control Message Protocol (ICMP) Message Types There are many Internet Control Message Protocol (ICMP) message types however, we’ll be discussing the main ones which will be very useful as a network professional. ICMP Type 0 – Echo Reply The Type 0 message is when a sender device is responding to an ICMP Type 8, Echo request. ICMP Type 3 – Destination Unreachable Type 3 is given then a destination cannot be found or is simply unreachable by the sender. However, ICMP Type 3 gives a bit more details by adding a Code to the message. Code 0 – Network Unreachable Code 1 – Host Unreachable Code 2 – Protocol Unreachable Code 3 – Port Unreachable Therefore combining the ICMP Type 3 message with a unique Code gives you, the network professional a better idea to the error on the network. ICMP Type 5 – Redirect An ICMP Type 5 message occurs when a default gateway device such as a router notifies the sender to send the traffic directly to another gateway which exists on the same network. One reason can the second gateway device or router may have a better route to the destination or a shorter path. ICMP Type 8 – Echo Request The ICMP Type 8 message is used by a sender device to check for basic network connectivity between itself and the intended recipient device. Any device receiving an ICMP Type 8 message, responds with an ICMP Type 0 – Echo Reply. ICMP Type 11 – Time Exceeded Type 11 is given the Time to Live (TTL) expires or reaches zero (0) before reaching the intended recipient device. The last gateway which adjusts the TTL to zero (0) notified the sender using an ICMP Type 11 message as displayed below: The -i parameter adjusts the Time To Live (TTL) value on the ICMP message. C:\>ping 8.8.8.8 -i 4Pinging 8.8.8.8 with 32 bytes of data: Reply from 179.60.213.149: TTL expired in transit. Reply from 179.60.213.66: TTL expired in transit. Reply from 179.60.213.66: TTL expired in transit. Reply from 179.60.213.66: TTL expired in transit. Ping statistics for 8.8.8.8:     Packets: Sent = 4, Received = 4, Lost = 0 (0% loss), Without adjusting the Time To Live (TTL) value of the ICMP Type 8 message, the sender received an ICMP Type 0 messages indicating successful transmission between both devices. C:\>ping 8.8.8.8Pinging 8.8.8.8 with 32 bytes of data: Reply from 8.8.8.8: bytes=32 time=52ms TTL=120 Reply from 8.8.8.8: bytes=32 time=52ms TTL=120 Reply from 8.8.8.8: bytes=32 time=52ms TTL=120 Reply from 8.8.8.8: bytes=32 time=52ms TTL=120 Ping statistics for 8.8.8.8:     Packets: Sent = 4, Received = 4, Lost = 0 (0% loss), Approximate round trip times in milliseconds: Minimum = 52ms, Maximum = 52ms, Average = 52ms [box type="shadow" align="" class="" width=""]Further information of Internet Control Message Protocol (ICMP) can also be found at: https://tools.ietf.org/html/rfc792.  Further information of all the Internet Control Message Protocol (ICMP) message types can be found at: https://www.iana.org/assignments/icmp-parameters/icmp-parameters.xhtml#icmp-parameters-codes-7.[/box] A simple and easy-to-use utility is Ping. The Ping utility harnesses the functionality of Internet Control Message Protocol (ICMP) and provides meaningful feedback whether communication is successful, unsuccessful, redirected, the destination host or network is unreachable, etc. The Ping utility is integrated into almost every, if not all modern day operating systems, from desktops, servers, and even mobile operating systems. The ping command can be executed in the Windows Command Prompt or the Terminal of Linux-based Operating Systems. When a user initiates the ping command with a destination address, the ping utility would send an ICMP Type 8 message to the intended destination. The syntax for checking basic connectivity is as follows: ping <ip address or hostname> ping 8.8.8.8 ping www.google.com Transmission Control Protocol (TCP) When you send a letter using your local postal service, have you ever wondered if your letter reaches the destination successfully, was your letter prioritized within the processing system of the mail service for delivery or what confirmation would you receive when the letter the is delivered successfully? Imagine in a network, these are the same concerns with devices. If one device sends a datagram to another device, whether one the same Local Area Network (LAN) or a remote network, what reassurance is given for the guarantee of the datagram (message) between sender and the receiver? Transmission Control Protocol (TCP) defined by RFC 793 is a connection-oriented protocol which operates are the Transport Layer of both the Open Systems Interconnection (OSI) reference model and the Transmission Control Protocol/Internet Protocol (TCP/IP) protocol stack. It is designed to provide reliable transportation of the datagrams over a network. It provides reassurance by initializing a 3-way handshake before communicating data between the sender the receiver. Let’s imagine there are two (2) devices who wants to communicate and use TCP to ensure their messages are delivered successfully. Let’s use a simple analogy to further explain the TCP 3-Way Handshake, we have two (2) device, Bob and Alice. Bob wants to exchanges data with Alice but needs to ensure the data being sent are successfully delivered, so Bob decides to use the Transmission Control Protocol (TCP) to guarantee the delivery. Bob initializes the TCP 3-Way Handshake by sending a TCP Synchronization (SYN) packet to Alice indicating he wants to establish a session or connection. Alice, upon receiving the SYN packet, responds to Bob indicating she also wants to establish a session and acknowledges receipt of the SYN packet using a TCP Synchronization and Acknowledgment (SYN/ACK) packet. Bob, upon receiving the TCP SYN packet from Alice, responds with a TCP Acknowledgement (ACK) packet. Now the TCP 3-Way Handshake is established, data can be exchanged between the two (2) devices, each datagram sent across the session between Bob and Alice, an ACK packet will be sent to confirm successful delivery of the message. What if Bob sends a message to Alice, and Bob does not receive an ACK from Alice? In this situation, Bob would retransmit the data again after certain intervals until an ACK packet is sent back to Bob. Another question you may have is, how does Transmission Control Protocol (TCP) terminates a session gracefully? Each device sends a TCP Finish (FIN) packet to each other indicating they would like to terminate the session. Furthermore, if we use a network protocol analyzer tools such as Wireshark, we can see the packet composition of each datagram passing across the network. The following exhibit is a capture using Wireshark during the writing of this book to demonstrate the TCP 3-Way Handshake. [box type="shadow" align="" class="" width=""]Reassemble packet in order[/box] User Datagram Protocol (UDP) User Datagram Protocol (UDP), defined by RFC 768 is a connectionless protocol. This protocol also operates at the Transport Layer of both the Open Systems Interconnection (OSI) reference model and the Transmission Control Protocol/Internet Protocol (TCP/IP) protocol stack. However, unlike Transmission Control Protocol (TCP), the User Datagram Protocol (UDP) does not provide any guarantee or reassurance of the delivery of datagrams across a network. Not all protocols at the Application Layer uses TCP, there are many Layer 7 protocols which uses the User Datagram Protocol (UDP). You may be wondering, why would an upper layer protocol uses UDP instead of TCP? Let do a brief recap of TCP, when devices are using TCP as their preferred Transport Layer protocol, each message sent between the sender and the receiver, an Acknowledge (ACK) packet is returned. This means if a sender such as Bob, sends one hundred (100) packets to Alice over the network, Alice would return one hundred (100) Acknowledgment (ACK) packets to Bob. Let’s imagine a larger network with hundreds, thousands or even the Internet, where everyone would use TCP, the returned traffic, in this case, would the ACK packets, would create a lot of overhead in the network and therefore cause congestion. This is a bit similar to having a roadway and the number of vehicles are increasing, this would cause traffic. Let’s use another analogy, a lot of persons globally uses YouTube for many reasons. Imagine if the video traffic uses TCP instead of UDP, YouTube has millions of users daily who streams content on the site. If each user were to send a TCP ACK packet back to YouTube on that very large scale, the YouTube network and even the Internet would be congested with a lot of TCP ACK packets and would cause the network performance to degrade. Therefore, not all upper layer protocols use TCP because of this issue. The way in which UDP behaves is simply sending datagrams without any reassurance or guarantee delivery of the message. When devices are communicating over a network, the path with each packet may take may be different from the other and therefore may be received in an out-of-order sequence. The User Datagram Protocol (UDP) does not provide any mechanisms for reassembly of the packet unlike the Transmission Control Protocol (TCP) which aids in the reassembly and reordering of the packets when they are received from the sender. [box type="shadow" align="" class="" width=""]Voice and video traffic use UDP as the preferred Transport Layer protocol.[/box] Comparison of TCP and UDP Transmission Control Protocol (TCP) Reliable Uses Acknowledgments to confirm receipt of data Re-sends data of any of the packets are lost during transmission Delivers the data in sequential order and handles reassembly Applications: HTTP, FTP, SMTP, Telnet. User Datagram Protocol (UDP) Very fast in delivery of data Very low overhead on the network Does not require any acknowledgment packets If packets are lost during transmission, it does not resend any lost data Does not send data in order or handles the reassembly Applications: DHCP, DNS, SNMP, TFTP, VoIP, IPTV. [box type="shadow" align="" class="" width=""]There are protocols which uses both TCP and UDP such as DNS and SNMP.[/box] Internet Protocol (IP) Internet Protocol (IP) defined by RFC 791 was created for operations in interconnected systems of packet-switched computer communication networks. Internet Protocol (IP) operates at the Network Layer of the Open Systems Interconnection (OSI) reference model and the Internet Layer of the Transmission Control Protocol/Internet Protocol (TCP/IP) protocol suite. However, Internet Protocol (IP) has three main characteristics: Connectionless – The sender of the message does not know if the recipient is available or not, the protocol sends the messages as is. If the message is successfully delivered to the intended recipient, the sender does not know if the message arrives or not. Since IP behaves a bit like UDP, there is not session create prior to the data communication, which leads to the receiver is not aware of any incoming messages. Uses Best Effort – Best Effort implies that Internet Protocol (IP) is unreliable. Similarly to UDP, Internet Protocol (IP) does not provide any guarantee of the data between a sender and receiver. Furthermore, if any data is lost during the transmission, IP does not have the functionality to facilitate the resending of any lost packets. Media Independent – The benefit of using Internet Protocol (IP) is, it is independent of the type of media being used for transporting the data between the sender and the receiver. At times, there are many different types of media between the sender and the receiver, such as copper cables, radio frequency, fiber optic, etc. Internet Protocol (IP) datagrams can be transported over any media type, the Data Link is responsible for formatting the Frame for each type of media as it leaves a device. Thus, in this article, we learned about the network port numbers and also about the different protocol types in detail. If you’ve enjoyed reading this article, and want to get a better understanding of the Network+ Certification read our book, CompTIA Network+ Certification Guide. Bo Weaver on Cloud security, skills gap, and software development in 2019 What matters on an engineering resume? Hacker Rank report says skills, not certifications Wolf Halton on what’s changed in tech and where we are headed
Read more
  • 0
  • 0
  • 33915
article-image-what-are-slowly-changing-dimensions-scd-and-why-you-need-them-in-your-data-warehouse
Savia Lobo
07 Dec 2017
8 min read
Save for later

What are Slowly changing Dimensions (SCD) and why you need them in your Data Warehouse?

Savia Lobo
07 Dec 2017
8 min read
[box type="note" align="" class="" width=""]Below given post is an excerpt from a book by Rahul Malewar titled Learning Informatica PowerCenter 10.x. The book is a quick guide to explore Informatica PowerCenter and its features such as working on sources, targets, transformations, performance optimization, and managing your data at speed. [/box] Our article explores what Slowly Changing Dimensions (SCD) are and how to implement them in Informatica PowerCenter. As the name suggests, SCD allows maintaining changes in the Dimension table in the data warehouse. These are dimensions that gradually change with time, rather than changing on a regular basis. When you implement SCDs, you actually decide how you wish to maintain historical data with the current data. Dimensions present within data warehousing and in data management include static data about certain entities such as customers, geographical locations, products, and so on. Here we talk about general SCDs: SCD1, SCD2, and SCD3. Apart from these, there are also Hybrid SCDs that you might come across. A Hybrid SCD is nothing but a combination of multiple SCDs to serve your complex business requirements. Types of SCD The various types of SCD are described as follows: Type 1 dimension mapping (SCD1): This keeps only current data and does not maintain historical data. Note : Use SCD1 mapping when you do not want history of previous data. Type 2 dimension/version number mapping (SCD2): This keeps current as well as historical data in the table. It allows you to insert new records and changed records using a new column (PM_VERSION_NUMBER) by maintaining the version number in the table to track the changes. We use a new column PM_PRIMARYKEY to maintain the history. Note : Use SCD2 mapping when you want to keep a full history of dimension data, and track the progression of changes using a version number. Consider there is a column LOCATION in the EMPLOYEE table and you wish to track the changes in the location on employees. Consider a record for Employee ID 1001 present in your EMPLOYEE dimension table. Steve was initially working in India and then shifted to USA. We are willing to maintain history on the LOCATION field. Type 2 dimension/flag mapping: This keeps current as well as historical data in the table. It allows you to insert new records and changed records using a new column (PM_CURRENT_FLAG) by maintaining the flag in the table to track the changes. We use a new column PRIMARY_KEY to maintain the history. Note : Use SCD2 mapping when you want to keep a full history of dimension data, and track the progression of changes using a flag. Let's take an example to understand different SCDs. Type 2 dimension/effective date range mapping: This keeps current as well as historical data in the table. SCD2 allows you to insert new records and changed records using two new columns (PM_BEGIN_DATE and PM_END_DATE) by maintaining the date range in the table to track the changes. We use a new column PRIMARY_KEY to maintain the history. Note : Use SCD2 mapping when you want to keep a full history of dimension data, and track the progression of changes using start date and end date. Type 3 Dimension mapping: This keeps current as well as historical data in the table. We maintain only partial history by adding a new column PM_PREV_COLUMN_NAME, that is, we do not maintain full history. Note: Use SCD3 mapping when you wish to maintain only partial history. EMPLOYEE_ID NAME LOCATION 1001 STEVE INDIA Your data warehouse table should reflect the current status of Steve. To implement this, we have different types of SCDs. SCD1 As you can see in the following table, INDIA will be replaced with USA, so we end up having only current data, and we lose historical data: PM_PRIMARY_KEY EMPLOYEE_ID NAME LOCATION 100 1001 STEVE USA Now if Steve is again shifted to JAPAN, the LOCATION data will be replaced from USA to JAPAN: PM_PRIMARY_KEY EMPLOYEE_ID NAME LOCATION 100 1001 STEVE JAPAN The advantage of SCD1 is that we do not consume a lot of space in maintaining the data. The disadvantage is that we don't have historical data. SCD2 - Version number As you can see in the following table, we are maintaining the full history by adding a new record to maintain the history of the previous records: PM_PRIMARYKEY EMPLOYEE_ID NAME LOCATION PM_VERSION_NUMBER 100 1001 STEVE INDIA 0 101 1001 STEVE USA 1 102 1001 STEVE JAPAN 2 200 1002 MIKE UK 0 We add two new columns in the table: PM_PRIMARYKEY to handle the issues of duplicate records in the primary key in the EMPLOYEE_ID (supposed to be the primary key) column, and PM_VERSION_NUMBER to understand current and history records. SCD2 - FLAG As you can see in the following table, we are maintaining the full history by adding new records to maintain the history of the previous records:   PM_PRIMARYKEY EMPLOYEE_ID NAME LOCATION PM_CURRENT_FLAG 100 1001 STEVE INDIA 0 101 1001 STEVE USA 1 We add two new columns in the table: PM_PRIMARYKEY to handle the issues of duplicate records in the primary key in the EMPLOYEE_ID column, and PM_CURRENT_FLAG to understand current and history records. Again, if Steve is shifted, the data looks like this: PM_PRIMARYKEY EMPLOYEE_ID NAME LOCATION PM_CURRENT_FLAG 100 1001 STEVE INDIA 0 101 1001 STEVE USA 0 102 1001 STEVE JAPAN 1 SCD2 - Date range As you can see in the following table, we are maintaining the full history by adding new records to maintain the history of the previous records: PM_PRIMARYKEY EMPLOYEE_ID NAME LOCATION PM_BEGIN_DATE PM_END_DATE 100 1001 STEVE INDIA 01-01-14 31-05-14 101 1001 STEVE USA 01-06-14 99-99-9999 We add three new columns in the table: PM_PRIMARYKEY to handle the issues of duplicate records in the primary key in the EMPLOYEE_ID column, and PM_BEGIN_DATE and PM_END_DATE to understand the versions in the data. The advantage of SCD2 is that you have complete history of the data, which is a must for data warehouse. The disadvantage of SCD2 is that it consumes a lot of space. SCD3 As you can see in the following table, we are maintaining the history by adding new columns: PM_PRIMARYKEY EMPLOYEE_ID NAME LOCATION PM_PREV_LOCATION 100 1001 STEVE USA INDIA An optional column PM_PRIMARYKEY can be added to maintain the primary key constraints. We add a new column PM_PREV_LOCATION in the table to store the changes in the data. As you can see, we added a new column to store data as against SCD2,where we added rows to maintain history. If Steve is now shifted to JAPAN, the data changes to this: PM_PRIMARYKEY EMPLOYEE_ID NAME LOCATION PM_PREV_LOCATION 100 1001 STEVE JAPAN USA As you can notice, we lost INDIA from the data warehouse, that is why we say we are maintaining partial history. Note : To implement SCD3, decide how many versions of a particular column you wish to maintain. Based on this, the columns will be added in the table. SCD3 is best when you are not interested in maintaining the complete but only partial history. The drawback of SCD3 is that it doesn't store the full history. At this point, you should be very clear about the different types of SCDs. We need to implement these concepts practically in Informatica PowerCenter. Informatica PowerCenter provides a utility called wizard to implement SCD. Using the wizard, you can easily implement any SCD. In the next topics, you will learn how to use the wizard to implement SCD1, SCD2, and SCD3. Before you proceed to the next section, please make sure you have a proper understanding of the transformations in Informatica PowerCenter. You should be clear about the source qualifier, expression, filter, router, lookup, update strategy, and sequence generator transformations. Wizard creates a mapping using all these transformations to implement the SCD functionality. When we implement SCD, there will be some new records that need to be loaded into the target table, and there will be some existing records for which we need to maintain the history. Note : The record that comes for the first time in the table will be referred to as the NEW record, and the record for which we need to maintain history will be referred to as the CHANGED record. Based on the comparison of the source data with the target data, we will decide which one is the NEW record and which is the CHANGED record. To start with, we will use a sample file as our source and the Oracle table as the target to implement SCDs. Before we implement SCDs, let's talk about the logic that will serve our purpose, and then we will fine-tune the logic for each type of SCD. Extract all records from the source. Look up on the target table, and cache all the data. Compare the source data with the target data to flag the NEW and CHANGED records. Filter the data based on the NEW and CHANGED flags. Generate the primary key for every new row inserted into the table. Load the NEW record into the table, and update the existing record if needed. In this article we concentrated on a very important table feature called slowly changing dimensions. We also discussed different types of SCDs, i.e., SCD1, SCD2, and SCD3. If you are looking to explore more in Informatica Powercentre, go ahead and check out the book Learning Informatica Powercentre 10.x.  
Read more
  • 0
  • 1
  • 33852

article-image-9-useful-r-packages-for-nlp-text-mining
Amey Varangaonkar
18 Dec 2017
6 min read
Save for later

9 Useful R Packages for NLP & Text Mining

Amey Varangaonkar
18 Dec 2017
6 min read
[box type="note" align="" class="" width=""]The following excerpt is taken from the book Mastering Text Mining with R, co-authored by Ashish Kumar and Avinash Paul. This book lists various techniques to extract useful and high-quality information from your textual data.[/box] There is a wide range of packages available in R for natural language processing and text mining. In the article below, we present some of the popular and widely used R packages for NLP: OpenNLP OpenNLP is an R package which provides an interface, Apache OpenNLP, which is a  machine-learning-based toolkit written in Java for natural language processing activities. Apache OpenNLP is widely used for most common tasks in NLP, such as tokenization, POS tagging, named entity recognition (NER), chunking, parsing, and so on. It provides wrappers for Maxent entropy models using the Maxent Java package. It provides functions for sentence annotation, word annotation, POS tag annotation, and annotation parsing using the Apache OpenNLP chunking parser. The Maxent Chunk annotator function computes the chunk annotation using the Maxent chunker provided by OpenNLP. The Maxent entity annotator function in R package utilizes the Apache OpenNLP Maxent name finder for entity annotation. Model files can be downloaded from http://opennlp.sourceforge.net/models-1.5/. These language models can be effectively used in R packages by installing the OpenNLPmodels.language package from the repository at http://datacube.wu.ac.at. Get the OpenNLP package here. Rweka The RWeka package in R provides an interface to Weka. Weka is an open source software developed by a machine learning group at the University of Wakaito, which provides a wide range of machine learning algorithms which can either be directly applied to a dataset or it can be called from a Java code. Different data-mining activities, such as data processing, supervised and unsupervised learning, association mining, and so on, can be performed using the RWeka package. For natural language processing, RWeka provides tokenization and stemming functions. RWeka packages provide an interface to Alphabetic, NGramTokenizers, and wordTokenizer functions, which can efficiently perform tokenization for contiguous alphabetic sequence, string-split to n-grams, or simple word tokenization, respectively. Get started with Rweka here. RcmdrPlugin.temis The RcmdrPlugin.temis package in R provides a graphical integrated text-mining solution. This package can be leveraged for many text-mining tasks, such as importing and cleaning a corpus, terms and documents count, term co-occurrences, correspondence analysis, and so on. Corpora can be imported from different sources and analysed using the importCorpusDlg function. The package provides flexible data source options to import corpora from different sources, such as text files, spreadsheet files, XML, HTML files, Alceste format and Twitter search. The Import function in this package processes the corpus and generates a term-document matrix. The package provides different functions to summarize and visualize the corpus statistics. Correspondence analysis and hierarchical clustering can be performed on the corpus. The corpusDissimilarity function helps analyse and create a crossdissimilarity table between term-documents present in the corpus. This package provides many functions to help the users explore the corpus. For example, frequentTerms to list the most frequent terms of a corpus, specificTerms to list terms most associated with each document, subsetCorpusByTermsDlg to create a subset of the corpus. Term frequency, term co-occurrence, term dictionary, temporal evolution of occurrences or term time series, term metadata variables, and corpus temporal evolution are among the other very useful functions available in this package for text mining. Download the package from CRAN page. tm The tm package is a text-mining framework which provides some powerful functions which will aid in text-processing steps. It has methods for importing data, handling corpus, metadata management, creation of term document matrices, and preprocessing methods. For managing documents using the tm package, we create a corpus which is a collection of text documents. There are two types of implementation, volatile corpus (VCorpus) and permanent corpus (PCropus). VCorpus is completely held in memory and when the R object is destroyed the corpus is gone. PCropus is stored in the filesystem and is present even after the R object is destroyed; this corpus can be created by using the VCorpus and PCorpus functions respectively. This package provides a few predefined sources which can be used to import text, such as DirSource, VectorSource, or DataframeSource. The getSources method lists available sources, and users can create their own sources. The tm package ships with several reader options: readPlain, readPDF, and readDOC. We can execute the getReaders method for an up-to-date list of available readers. To write a corpus to the filesystem, we can use writeCorpus. For inspecting a corpus, there are methods such as inspect and print. For transformation of text, such as stop-word removal, stemming, whitespace removal, and so on, we can use the tm_map, content_transformer, tolower, stopwords("english") functions. For metadata management, meta comes in handy. The tm package provides various quantitative function for text analysis, such as DocumentTermMatrix , findFreqTerms, findAssocs, and removeSparseTerms. Download the tm package here. languageR languageR provides data sets and functions for statistical analysis on text data. This package contains functions for vocabulary richness, vocabulary growth, frequency spectrum, also mixed-effects models and so on. There are simulation functions available: simple regression, quasi-F factor, and Latin-square designs. Apart from that, this package can also be used for correlation, collinearity diagnostic, diagnostic visualization of logistic models, and so on. koRpus The koRpus package is a versatile tool for text mining which implements many functions for text readability and lexical variation. Apart from that, it can also be used for basic level functions such as tokenization and POS tagging. You can find more information about its current version and dependencies here. RKEA The RKEA package provides an interface to KEA, which is a tool for keyword extraction from texts. RKEA requires a keyword extraction model, which can be created by manually indexing a small set of texts, using which it extracts keywords from the document. maxent The maxent package in R provides tools for low-memory implementation of multinomial logistic regression, which is also called the maximum entropy model. This package is quite helpful for classification processes involving sparse term-document matrices, and low memory consumption on huge datasets. Download and get started with maxent. lsa Truncated singular vector decomposition can help overcome the variability in a term-document matrix by deriving the latent features statistically. The lsa package in R provides an implementation of latent semantic analysis. The ease of use and efficiency of R packages can be very handy when carrying out even the trickiest of text mining task. As a result, they have grown to become very popular in the community. If you found this post useful, you should definitely refer to our book Mastering Text Mining with R. It will give you ample techniques for effective text mining and analytics using the above mentioned packages.
Read more
  • 0
  • 1
  • 33735

article-image-building-your-own-basic-behavior-tree-tutorial
Natasha Mathur
11 Oct 2018
12 min read
Save for later

Building your own Basic Behavior tree in Unity [Tutorial]

Natasha Mathur
11 Oct 2018
12 min read
Behavior trees (BTs) have been gaining popularity among game developers very steadily.  Games such as Halo and Gears of War are among the more famous franchises to make extensive use of BTs. An abundance of computing power in PCs, gaming consoles, and mobile devices has made them a good option for implementing AI in games of all types and scopes. In this tutorial, we will look at the basics of a behavior tree and its implementation.  Over the last decade, BTs have become the pattern of choice for many developers when it comes to implementing behavioral rules for their AI agents. This tutorial is an excerpt taken from the book 'Unity 2017 Game AI programming - Third Edition' written by Raymundo Barrera, Aung Sithu Kyaw, and Thet Naing Swe. Note: You need to have Unity 2017 installed on a system that has either Windows 7 SP1+, 8, 10, 64-bit versions or Mac OS X 10.9+. Let's first have a look at the basics of behavior trees. Learning the basics of behavior trees Behavior trees got their name from their hierarchical, branching system of nodes with a common parent, known as the root. Behavior trees mimic the real thing they are named after—in this case, trees, and their branching structure. If we were to visualize a behavior tree, it would look something like the following figure: A basic tree structure Of course, behavior trees can be made up of any number of nodes and child nodes. The nodes at the very end of the hierarchy are referred to as leaf nodes, just like a tree. Nodes can represent behaviors or tests. Unlike state machines, which rely on transition rules to traverse through them, a BT's flow is defined strictly by each node's order within the larger hierarchy. A BT begins evaluating from the top of the tree (based on the preceding visualization), then continues through each child, which, in turn, runs through each of its children until a condition is met or the leaf node is reached. BTs always begin evaluating from the root node. Evaluating the existing solutions - Unity Asset store and others The Unity asset store is an excellent resource for developers. Not only are you able to purchase art, audio, and other kinds of assets, but it is also populated with a large number of plugins and frameworks. Most relevant to our purposes, there are a number of behavior tree plugins available on the asset store, ranging from free to a few hundred dollars. Most, if not all, provide some sort of GUI to make visualizing and arranging a fairly painless experience. There are many advantages of going with an off-the-shelf solution from the asset store. Many of the frameworks include advanced functionality such as runtime (and often visual) debugging, robust APIs, serialization, and data-oriented tree support. Many even include sample leaf logic nodes to use in your game, minimizing the amount of coding you have to do to get up and running. Some other alternatives are Behavior Machine and Behavior Designer, which offer different pricing tiers (Behavior Machine even offers a free edition) and a wide array of useful features. Many other options can be found for free around the web as both generic C# and Unity-specific implementations. Ultimately, as with any other system, the choice of rolling your own or using an existing solution will depend on your time, budget, and project. Implementing a basic behavior tree framework Our example focuses on simple logic to highlight the functionality of the tree, rather than muddy up the example with complex game logic. The goal of our example is to make you feel comfortable with what can seem like an intimidating concept in game AI, and give you the necessary tools to build your own tree and expand upon the provided code if you do so. Implementing a base Node class There is a base functionality that needs to go into every node. Our simple framework will have all the nodes derived from a base abstract Node.cs class. This class will provide said base functionality or at least the signature to expand upon that functionality: using UnityEngine; using System.Collections; [System.Serializable] public abstract class Node { /* Delegate that returns the state of the node.*/ public delegate NodeStates NodeReturn(); /* The current state of the node */ protected NodeStates m_nodeState; public NodeStates nodeState { get { return m_nodeState; } } /* The constructor for the node */ public Node() {} /* Implementing classes use this method to evaluate the desired set of conditions */ public abstract NodeStates Evaluate(); } The class is fairly simple. Think of Node.cs as a blueprint for all the other node types to be built upon. We begin with the NodeReturn delegate, which is not implemented in our example, but the next two fields are. However, m_nodeState is the state of a node at any given point. As we learned earlier, it will be either FAILURE, SUCCESS, or RUNNING. The nodeState value is simply a getter for m_nodeState since it is protected and we don't want any other area of the code directly setting m_nodeState inadvertently. Next, we have an empty constructor, for the sake of being explicit, even though it is not being used. Lastly, we have the meat and potatoes of our Node.cs class—the Evaluate() method. As we'll see in the classes that implement Node.cs, Evaluate() is where the magic happens. It runs the code that determines the state of the node. Extending nodes to selectors To create a selector, we simply expand upon the functionality that we described in the Node.cs class: using UnityEngine; using System.Collections; using System.Collections.Generic; public class Selector : Node { /** The child nodes for this selector */ protected List<Node> m_nodes = new List<Node>(); /** The constructor requires a lsit of child nodes to be * passed in*/ public Selector(List<Node> nodes) { m_nodes = nodes; } /* If any of the children reports a success, the selector will * immediately report a success upwards. If all children fail, * it will report a failure instead.*/ public override NodeStates Evaluate() { foreach (Node node in m_nodes) { switch (node.Evaluate()) { case NodeStates.FAILURE: continue; case NodeStates.SUCCESS: m_nodeState = NodeStates.SUCCESS; return m_nodeState; case NodeStates.RUNNING: m_nodeState = NodeStates.RUNNING; return m_nodeState; default: continue; } } m_nodeState = NodeStates.FAILURE; return m_nodeState; } } As we learned earlier, selectors are composite nodes: this means that they have one or more child nodes. These child nodes are stored in the m_nodes List<Node> variable. Although it's conceivable that one could extend the functionality of this class to allow adding more child nodes after the class has been instantiated, we initially provide this list via the constructor. The next portion of the code is a bit more interesting as it shows us a real implementation of the concepts we learned earlier. The Evaluate() method runs through all of its child nodes and evaluates each one individually. As a failure doesn't necessarily mean a failure for the entire selector, if one of the children returns FAILURE, we simply continue on to the next one. Inversely, if any child returns SUCCESS, then we're all set; we can set this node's state accordingly and return that value. If we make it through the entire list of child nodes and none of them have returned SUCCESS, then we can essentially determine that the entire selector has failed and we assign and return a FAILURE state. Moving on to sequences Sequences are very similar in their implementation, but as you might have guessed by now, the Evaluate() method behaves differently: using UnityEngine; using System.Collections; using System.Collections.Generic; public class Sequence : Node { /** Children nodes that belong to this sequence */ private List<Node> m_nodes = new List<Node>(); /** Must provide an initial set of children nodes to work */ public Sequence(List<Node> nodes) { m_nodes = nodes; } /* If any child node returns a failure, the entire node fails. Whence all * nodes return a success, the node reports a success. */ public override NodeStates Evaluate() { bool anyChildRunning = false; foreach(Node node in m_nodes) { switch (node.Evaluate()) { case NodeStates.FAILURE: m_nodeState = NodeStates.FAILURE; return m_nodeState; case NodeStates.SUCCESS: continue; case NodeStates.RUNNING: anyChildRunning = true; continue; default: m_nodeState = NodeStates.SUCCESS; return m_nodeState; } } m_nodeState = anyChildRunning ? NodeStates.RUNNING : NodeStates.SUCCESS; return m_nodeState; } } The Evaluate() method in a sequence will need to return true for all the child nodes, and if any one of them fails during the process, the entire sequence fails, which is why we check for FAILURE first and set and report it accordingly. A SUCCESS state simply means we get to live to fight another day, and we continue on to the next child node. If any of the child nodes are determined to be in the RUNNING state, we report that as the state for the node, and then the parent node or the logic driving the entire tree can evaluate it again. Implementing a decorator as an inverter The structure of Inverter.cs is a bit different, but it derives from Node, just like the rest of the nodes. Let's take a look at the code and spot the differences: using UnityEngine; using System.Collections; public class Inverter : Node { /* Child node to evaluate */ private Node m_node; public Node node { get { return m_node; } } /* The constructor requires the child node that this inverter decorator * wraps*/ public Inverter(Node node) { m_node = node; } /* Reports a success if the child fails and * a failure if the child succeeds. Running will report * as running */ public override NodeStates Evaluate() { switch (m_node.Evaluate()) { case NodeStates.FAILURE: m_nodeState = NodeStates.SUCCESS; return m_nodeState; case NodeStates.SUCCESS: m_nodeState = NodeStates.FAILURE; return m_nodeState; case NodeStates.RUNNING: m_nodeState = NodeStates.RUNNING; return m_nodeState; } m_nodeState = NodeStates.SUCCESS; return m_nodeState; } } As you can see, since a decorator only has one child, we don't have List<Node>, but rather a single node variable, m_node. We pass this node in via the constructor (essentially requiring it), but there is no reason you couldn't modify this code to provide an empty constructor and a method to assign the child node after instantiation. The Evalute() implementation implements the behavior of an inverter.  When the child evaluates as SUCCESS, the inverter reports a FAILURE, and when the child evaluates as FAILURE, the inverter reports a SUCCESS. The RUNNING state is reported normally. Creating a generic action node Now we arrive at ActionNode.cs, which is a generic leaf node to pass in some logic via a delegate. You are free to implement leaf nodes in any way that fits your logic, as long as it derives from Node. This particular example is equal parts flexible and restrictive. It's flexible in the sense that it allows you to pass in any method matching the delegate signature, but is restrictive for this very reason—it only provides one delegate signature that doesn't take in any arguments: using System; using UnityEngine; using System.Collections; public class ActionNode : Node { /* Method signature for the action. */ public delegate NodeStates ActionNodeDelegate(); /* The delegate that is called to evaluate this node */ private ActionNodeDelegate m_action; /* Because this node contains no logic itself, * the logic must be passed in in the form of * a delegate. As the signature states, the action * needs to return a NodeStates enum */ public ActionNode(ActionNodeDelegate action) { m_action = action; } /* Evaluates the node using the passed in delegate and * reports the resulting state as appropriate */ public override NodeStates Evaluate() { switch (m_action()) { case NodeStates.SUCCESS: m_nodeState = NodeStates.SUCCESS; return m_nodeState; case NodeStates.FAILURE: m_nodeState = NodeStates.FAILURE; return m_nodeState; case NodeStates.RUNNING: m_nodeState = NodeStates.RUNNING; return m_nodeState; default: m_nodeState = NodeStates.FAILURE; return m_nodeState; } } } The key to making this node work is the m_action delegate. For those familiar with C++, a delegate in C# can be thought of as a function pointer of sorts. You can also think of a delegate as a variable containing (or more accurately, pointing to) a function. This allows you to set the function to be called at runtime. The constructor requires you to pass in a method matching its signature and is expecting that method to return a NodeStates enum. That method can implement any logic you want, as long as these conditions are met. Unlike other nodes we've implemented, this one doesn't fall through to any state outside of the switch itself, so it defaults to a FAILURE state. You may choose to default to a SUCCESS or RUNNING state, if you so wish, by modifying the default return. You can easily expand on this class by deriving from it or simply making the changes to it that you need. You can also skip this generic action node altogether and implement one-off versions of specific leaf nodes, but it's good practice to reuse as much code as possible. Just remember to derive from Node and implement the required code! We learned basics of how a behavior tree works, then we created a sample behavior tree using our framework. If you found this post useful and want to learn other concepts in Behavior trees, be sure to check out the book 'Unity 2017 Game AI programming - Third Edition'. AI for game developers: 7 ways AI can take your game to the next level Techniques and Practices of Game AI
Read more
  • 0
  • 5
  • 33635
article-image-working-webcam-and-pi-camera
Packt
09 Feb 2016
13 min read
Save for later

Working with a Webcam and Pi Camera

Packt
09 Feb 2016
13 min read
In this article by Ashwin Pajankar and Arush Kakkar, the author of the book Raspberry Pi By Example we will learn how to use different types and uses of cameras with our Pi. Let's take a look at the topics we will study and implement in this article: Working with a webcam Crontab Timelapse using a webcam Webcam video recording and playback Pi Camera and Pi NOIR comparison Timelapse using Pi Camera The PiCamera module in Python (For more resources related to this topic, see here.) Working with webcams USB webcams are a great way to capture images and videos. Raspberry Pi supports common USB webcams. To be on the safe side, here is a list of the webcams supported by Pi: http://elinux.org/RPi_USB_Webcams. I am using a Logitech HD c310 USB Webcam. You can purchase it online, and you can find the product details and the specifications at http://www.logitech.com/en-in/product/hd-webcam-c310. Attach your USB webcam to Raspberry Pi through the USB port on Pi and run the lsusb command in the terminal. This command lists all the USB devices connected to the computer. The output should be similar to the following output depending on which port is used to connect the USB webcam:   pi@raspberrypi ~/book/chapter04 $ lsusb Bus 001 Device 002: ID 0424:9514 Standard Microsystems Corp. Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Bus 001 Device 003: ID 0424:ec00 Standard Microsystems Corp. Bus 001 Device 004: ID 148f:2070 Ralink Technology, Corp. RT2070 Wireless Adapter Bus 001 Device 007: ID 046d:081b Logitech, Inc. Webcam C310 Bus 001 Device 006: ID 1c4f:0003 SiGma Micro HID controller Bus 001 Device 005: ID 1c4f:0002 SiGma Micro Keyboard TRACER Gamma Ivory Then, install the fswebcam utility by running the following command: sudo apt-get install fswebcam The fswebcam is a simple command-line utility that captures images with webcams for Linux computers. Once the installation is done, you can use the following command to create a directory for output images: mkdir /home/pi/book/output Then, run the following command to capture the image: fswebcam -r 1280x960 --no-banner ~/book/output/camtest.jpg This will capture an image with a resolution of 1280 x 960. You might want to try another resolution for your learning. The --no-banner command will disable the timestamp banner. The image will be saved with the filename mentioned. If you run this command multiple times with the same filename, the image file will be overwritten each time. So, make sure that you change the filename if you want to save previously captured images. The text output of the command should be similar to the following: --- Opening /dev/video0... Trying source module v4l2... /dev/video0 opened. No input was specified, using the first. --- Capturing frame... Corrupt JPEG data: 2 extraneous bytes before marker 0xd5 Captured frame in 0.00 seconds. --- Processing captured image... Disabling banner. Writing JPEG image to '/home/pi/book/output/camtest.jpg'. Crontab A cron is a time-based job scheduler in Unix-like computer operating systems. It is driven by a crontab (cron table) file, which is a configuration file that specifies shell commands to be run periodically on a given schedule. It is used to schedule commands or shell scripts to run periodically at a fixed time, date, or interval. The syntax for crontab in order to schedule a command or script is as follows: 1 2 3 4 5 /location/command Here, the following are the definitions: 1: Minutes (0-59) 2: Hours (0-23) 3: Days (0-31) 4: Months [0-12 (1 for January)] 5: Days of the week [0-7 ( 7 or 0 for Sunday)] /location/command: The script or command name to be scheduled The crontab entry to run any script or command every minute is as follows: * * * * * /location/command 2>&1 In the next section, we will learn how to use crontab to schedule a script to capture images periodically in order to create the timelapse sequence. You can refer to this URL for more details oncrontab: http://www.adminschoice.com/crontab-quick-reference. Creating a timelapse sequence using fswebcam Timelapse photography means capturing photographs in regular intervals and playing the images with a higher frequency in time than those that were shot. For example, if you capture images with a frequency of one image per minute for 10 hours, you will get 600 images. If you combine all these images in a video with 30 images per second, you will get 10 hours of timelapse video compressed in 20 seconds. You can use your USB webcam with Raspberry Pi to achieve this. We already know how to use the Raspberry Pi with a Webcam and the fswebcam utility to capture an image. The trick is to write a script that captures images with different names and then add this script in crontab and make it run at regular intervals. Begin with creating a directory for captured images: mkdir /home/pi/book/output/timelapse Open an editor of your choice, write the following code, and save it as timelapse.sh: #!/bin/bash DATE=$(date +"%Y-%m-%d_%H%M") fswebcam -r 1280x960 --no-banner /home/pi/book/output/timelapse/garden_$DATE.jpg Make the script executable using: chmod +x timelapse.sh This shell script captures the image and saves it with the current timestamp in its name. Thus, we get an image with a new filename every time as the file contains the timestamp. The second line in the script creates the timestamp that we're using in the filename. Run this script manually once, and make sure that the image is saved in the /home/pi/book/output/timelapse directory with the garden_<timestamp>.jpg name. To run this script at regular intervals, we need to schedule it in crontab. The crontab entry to run our script every minute is as follows: * * * * * /home/pi/book/chapter04/timelapse.sh 2>&1 Open the crontab of the Pi user with crontab –e. It will open crontab with nano as the editor. Add the preceding line to crontab, save it, and exit it. Once you exit crontab, it will show the following message: no crontab for pi - using an empty one crontab: installing new crontab Our timelapse webcam setup is now live. If you want to change the image capture frequency, then you have to change the crontab settings. To set it every 5 minutes, change it to */5 * * * *. To set it for every 2 hours, use 0 */2 * * *. Make sure that your MicroSD card has enough free space to store all the images for the time duration for which you need to keep your timelapse setup. Once you capture all the images, the next part is to encode them all in a fast playing video, preferably 20 to 30 frames per second. For this part, the mencoder utility is recommended. The following are the steps to create a timelapse video with mencoder on a Raspberry Pi or any Debian/Ubuntu machine: Install mencoder using sudo apt-get install mencoder. Navigate to the output directory by issuing: cd /home/pi/book/output/timelapse Create a list of your timelapse sequence images using: ls garden_*.jpg > timelapse.txt Use the following command to create a video: mencoder -nosound -ovc lavc -lavcopts vcodec=mpeg4:aspect=16/9:vbitrate=8000000 -vf scale=1280:960 -o timelapse.avi -mf type=jpeg:fps=30 mf://@timelapse.txt This will create a video with name timelapse.avi in the current directory with all the images listed in timelapse.txt with a 30 fps frame rate. The statement contains the details of the video codec, aspect ratio, bit rate, and scale. For more information, you can run man mencoder on Command Prompt. We will cover how to play a video in the next section. Webcam video recording and playback We can use a webcam to record live videos using avconv. Install avconv using sudo apt-get install libav-tools. Use the following command to record a video: avconv -f video4linux2 -r 25 -s 1280x960 -i /dev/video0 ~/book/output/VideoStream.avi It will show following output on the screen. pi@raspberrypi ~ $ avconv -f video4linux2 -r 25 -s 1280x960 -i /dev/video0 ~/book/output/VideoStream.avi avconv version 9.14-6:9.14-1rpi1rpi1, Copyright (c) 2000-2014 the Libav developers built on Jul 22 2014 15:08:12 with gcc 4.6 (Debian 4.6.3-14+rpi1) [video4linux2 @ 0x5d6720] The driver changed the time per frame from 1/25 to 2/15 [video4linux2 @ 0x5d6720] Estimating duration from bitrate, this may be inaccurate Input #0, video4linux2, from '/dev/video0': Duration: N/A, start: 629.030244, bitrate: 147456 kb/s Stream #0.0: Video: rawvideo, yuyv422, 1280x960, 147456 kb/s, 1000k tbn, 7.50 tbc Output #0, avi, to '/home/pi/book/output/VideoStream.avi': Metadata: ISFT : Lavf54.20.4 Stream #0.0: Video: mpeg4, yuv420p, 1280x960, q=2-31, 200 kb/s, 25 tbn, 25 tbc Stream mapping: Stream #0:0 -> #0:0 (rawvideo -> mpeg4) Press ctrl-c to stop encoding frame= 182 fps= 7 q=31.0 Lsize= 802kB time=7.28 bitrate= 902.4kbits/s video:792kB audio:0kB global headers:0kB muxing overhead 1.249878% Received signal 2: terminating. You can terminate the recording sequence by pressing Ctrl + C. We can play the video using omxplayer. It comes with the latest raspbian, so there is no need to install it. To play a file with the name vid.mjpg, use the following command: omxplayer ~/book/output/VideoStream.avi It will play the video and display some output similar to the one here: pi@raspberrypi ~ $ omxplayer ~/book/output/VideoStream.avi Video codec omx-mpeg4 width 1280 height 960 profile 0 fps 25.000000 Subtitle count: 0, state: off, index: 1, delay: 0 V:PortSettingsChanged: 1280x960@25.00 interlace:0 deinterlace:0 anaglyph:0 par:1.00 layer:0 have a nice day ;) Try playing timelapse and record videos using omxplayer. Working with the Pi Camera and NoIR Camera Modules These camera modules are specially manufactured for Raspberry Pi and work with all the available models. You will need to connect the camera module to the CSI port, located behind the Ethernet port, and activate the camera using the raspi-config utility if you haven't already. You can find the video instructions to connect the camera module to Raspberry Pi at http://www.raspberrypi.org/help/camera-module-setup/. This page lists the types of camera modules available: http://www.raspberrypi.org/products/. Two types of camera modules are available for the Pi. These are Pi Camera and Pi NoIR camera, and they can be found at https://www.raspberrypi.org/products/camera-module/ and https://www.raspberrypi.org/products/pi-noir-camera/, respectively. The following image shows Pi Camera and Pi NoIR Camera boards side by side: The following image shows the Pi Camera board connected to the Pi: The following is an image of the Pi camera board placed in the camera case: The main difference between Pi Camera and Pi NoIR Camera is that Pi Camera gives better results in good lighting conditions, whereas Pi NoIR (NoIR stands for No-Infra Red) is used for low light photography. To use NoIR Camera in complete darkness, we need to flood the object to be photographed with infrared light. This is a good time to take a look at the various enclosures for Raspberry Pi Models. You can find various cases available online at https://www.adafruit.com/categories/289. An example of a Raspberry Pi case is as follows: Using raspistill and raspivid To capture images and videos using the Raspberry Pi camera module, we need to use raspistill and raspivid utilities. To capture an image, run the following command: raspistill -o cam_module_pic.jpg This will capture and save the image with name cam_module_pic.jpg in the current directory. To capture a 20 second video with the camera module, run the following command: raspivid –o test.avi –t 20000 This will capture and save the video with name test.avi in the current directory. Unlike fswebcam and avconv, raspistill and raspivid do not write anything to the console. So, you need to check the current directory for the output. Also, one can run the echo $? command to check whether these commands executed successfully. We can also mention the complete location of the file to be saved in these command, as shown in the following example: raspistill -o /home/pi/book/output/cam_module_pic.jpg Just like fswebcam, raspistill can be used to record the timelapse sequence. In our timelapse shell script, replace the line that contains fswebcam with the appropriate raspistill command to capture the timelapse sequence and use mencoder again to create the video. This is left as an exercise for the readers. Now, let's take a look at the images taken with the Pi camera under different lighting conditions. The following is the image with normal lighting and the backlight: The following is the image with only the backlight: The following is the image with normal lighting and no backlight: For NoIR camera usage in the night under low light conditions, use IR illuminator light for better results. You can get it online. A typical off-the-shelf LED IR illuminator suitable for our purpose will look like the one shown here: Using picamera in Python with the Pi Camera module picamera is a Python package that provides a programming interface to the Pi Camera module. The most recent version of raspbian has picamera preinstalled. If you do not have it installed, you can install it using: sudo apt-get install python-picamera The following program quickly demonstrates the basic usage of the picamera module to capture an image: import picamera import time with picamera.PiCamera() as cam: cam.resolution=(1024,768) cam.start_preview() time.sleep(5) cam.capture('/home/pi/book/output/still.jpg') We have to import time and picamera modules first. cam.start_preview()will start the preview, and time.sleep(5) will wait for 5 seconds before cam.capture() captures and saves image in the specified file. There is a built-in function in picamera for timelapse photography. The following program demonstrates its usage: import picamera import time with picamera.PiCamera() as cam: cam.resolution=(1024,768) cam.start_preview() time.sleep(3) for count, imagefile in enumerate(cam.capture_continuous ('/home/pi/book/output/image{counter: 02d}.jpg')): print 'Capturing and saving ' + imagefile time.sleep(1) if count == 10: break In the preceding code, cam.capture_continuous()is used to capture the timelapse sequence using the Pi camera module. Checkout more examples and API references for the picamera module at http://picamera.readthedocs.org/. The Pi camera versus the webcam Now, after using the webcam and the Pi camera, it's a good time to understand the differences, the pros, and the cons of using these. The Pi camera board does not use a USB port and is directly interfaced to the Pi. So, it provides better performance than a webcam in terms of the frame rate and resolution. We can directly use the picamera module in Python to work on images and videos. However, the Pi camera cannot be used with any other computer. A webcam uses an USB port for interface, and because of that, it can be used with any computer. However, compared to the Pi camera its performance, it is lower in terms of the frame rate and resolution. Summary In this article, we learned how to use a webcam and the Pi camera. We also learned how to use utilities such as fswebcam, avconv, raspistill, raspivid, mencoder, and omxplayer. We covered how to use crontab. We used the Python picamera module to programmatically work with the Pi camera board. Finally, we compared the Pi camera and the webcam. We will be reusing all the code examples and concepts for some real-life projects soon. Resources for Article: Further resources on this subject: Introduction to the Raspberry Pi's Architecture and Setup [article] Raspberry Pi LED Blueprints [article] Hacking a Raspberry Pi project? Understand electronics first! [article]
Read more
  • 0
  • 0
  • 33566

article-image-making-simple-web-based-ssh-client-using-nodejs-and-socketio
Jakub Mandula
28 Oct 2015
7 min read
Save for later

Making a simple Web based SSH client using Node.js and Socket.io

Jakub Mandula
28 Oct 2015
7 min read
If you are reading this post, you probably know what SSH stands for. But just for the sake of formality, here we go: SSH stands for Secure Shell. It is a network protocol for secure access to the shell on a remote computer. You can do much more over SSH besides commanding your computer. Here you can find further information: http://en.wikipedia.org/wiki/Secure_Shell. In this post, we are going to create a very simple web terminal. And when I say simple, I mean it! However much you like colors, it will not support them because the parsing is just beyond the scope of this post. If you want a good client-side terminal library use term.js. It is made by the same guy who wrote pty.js, which we will be using. It is able to handle pretty much all key events and COLORS!!!! Installation I am going to assume you already have your node and npm installed. First we will install all of the npm packages we will be using: npm install express pty.js socket.io Express is a super cool web framework for Node. We are going to use it to serve our static files. I know it is a bit overkill, but I like Express. pty.js is where the magic will be happening. It forks processes into virtual pseudo terminals and provides bindings for communication. Socket.io is what we will use to transmit the data from the web browser to the server and back. It uses modern WebSockets, but provides fallbacks for backward compatibility. Anytime you want to create a real-time application, Socket.io is the way to go. Planning First things first, we need to think what we want the program to do. We want the program to create an instance of a shell on the server (remote machine) and send all of the text to the browser. Back in the browser, we want to capture any user events and send them back to the server shell. The WebSSH server This is the code that will power the terminal forwarding. Open a new file named server.js and start by importing all of the libraries: var express = require('express'); var https = require('https'); var http = require('http'); var fs = require('fs'); var pty = require('pty.js'); Set up express: // Setup the express app var app = express(); // Static file serving app.use("/",express.static("./")); Next we are going to create the server. // Creating an HTTP server var server = http.createServer(app).listen(8080) If you want to use HTTPS, which you probably will, you need to generate a key and certificate and import them as shown. var options = { key: fs.readFileSync('keys/key.pem'), cert: fs.readFileSync('keys/cert.pem') }; Then use the options object to create the actual server. Notice that this time we are using the https package. // Create an HTTPS server var server = https.createServer(options, app).listen(8080) CAUTION: Even if you use HTTPS, do not use this example program on the Internet. You are not authenticating the client in any way and thus providing a free open gate to your computer. Please make sure you only use this on your Private network protected by a firewall!!! Now bind the socket.io instance to the server: var io = require('socket.io')(server); After this, we can set up the place where the magic happens. // When a new socket connects io.on('connection', function(socket){ // Create terminal var term = pty.spawn('sh', [], { name: 'xterm-color', cols: 80, rows: 30, cwd: process.env.HOME, env: process.env }); // Listen on the terminal for output and send it to the client term.on('data', function(data){ socket.emit('output', data); }); // Listen on the client and send any input to the terminal socket.on('input', function(data){ term.write(data); }); // When socket disconnects, destroy the terminal socket.on("disconnect", function(){ term.destroy(); console.log("bye"); }); }); In this block, all we do is wait for new connections. When we get one, we spawn a new virtual terminal and start to pump the data from the terminal to the socket and vice versa. After the socket disconnects, we make sure to destroy the terminal. If you have noticed, I am using the simple sh shell. I did this mainly because I don't have a fancy prompt on it. Because we are not adding any parsing logic, my bash prompt would show up like this: ]0;piman@mothership: ~ _[01;32m✓ [33mpiman_[0m ↣ _[1;34m[~]_[37m$[0m - Eww! But you may use any shell you like. This is all that we need on the server side. Save the file and close it. Client side The client side is going to be just a very simple HTML file. Start with a very simple HTML markup: <!doctype html> <html> <head> <title>SSH Client</title> <script type="text/javascript" src="//cdnjs.cloudflare.com/ajax/libs/socket.io/1.3.5/socket.io.min.js"></script> <script type="text/javascript" src="//cdnjs.cloudflare.com/ajax/libs/jquery/2.1.4/jquery.min.js"></script> <style> body { margin: 0; padding: 0; } .terminal { font-family: monospace; color: white; background: black; } </style> </head> <body> <h1>SSH</h1> <div class="terminal"> </div> <script> </script> </body> </html> I am downloading the client side libraries jquery and socket.io from cdnjs. All of the client code will be written in the script tag below the terminal div. Surprisingly the code is very simple: // Connect to the socket.io server var socket = io.connect('http://localhost:8080'); // Wait for data from the server socket.on('output', function (data) { // Insert some line breaks where they belong data = data.replace("n", "<br>"); data = data.replace("r", "<br>"); // Append the data to our terminal $('.terminal').append(data); }); // Listen for user input and pass it to the server $(document).on("keypress",function(e){ var char = String.fromCharCode(e.which); socket.emit("input", char); }); Notice that we do not have to explicitly append the text the client types to the terminal mainly because the server echos it back anyways. Now we are done! Run the server and open up the URL in your browser. node server.js You should see a small prompt and be able to start typing commands. You can now explore you machine from the browser! Remember that our Web Terminal does not support Tab, Ctrl, Backspace or Esc characters. Implementing this is your homework. Conclusion I hope you found this tutorial useful. You can apply the knowledge in any real-time application where communication with the server is critical. All the code is available here. Please note, that if you'd like to use a browser terminal I strongly recommend term.js. It supports colors and styles and all the basic keys including Tabs, Backspace etc. I use it in my PiDashboard project. It is much cleaner and less tedious than the example I have here. I can't wait what amazing apps you will invent based on this. About the Author Jakub Mandula is a student interested in anything to do with technology, computers, mathematics or science.
Read more
  • 0
  • 6
  • 33337