Languages | 135 articles | Tech News, Tutorials & Expert Insights

07 Jul 2016

6 min read

Delphi Cookbook

07 Jul 2016

In this article by Daniele Teti author of the book Delphi Cookbook - Second Edition we will study about Multithreading. Multithreading can be your biggest problem if you cannot handle it with care. One of the fathers of the Delphi compiler used to say: "New programmers are drawn to multithreading like moths to flame, with similar results." – Danny Thorpe (For more resources related to this topic, see here.) In this chapter, we will discuss some of the main techniques to handle single or multiple background threads. We'll talk about shared resource synchronization and thread-safe queues and events. The last three recipes will talk about the Parallel Programming Library introduced in Delphi XE7, and I hope that you will love it as much as I love it. Multithreaded programming is a huge topic. So, after reading this chapter, although you will not become a master of it, you will surely be able to approach the concept of multithreaded programming with confidence and will have the basics to jump on to more specific stuff when (and if) you require them. Talking with the main thread using a thread-safe queue Using a background thread and working with its private data is not difficult, but safely bringing information retrieved or elaborated by the thread back to the main thread to show them to the user (as you know, only the main thread can handle the GUI in VCL as well as in FireMonkey) can be a daunting task. An even more complex task would be establishing a generic communication between two or more background threads. In this recipe, you'll see how a background thread can talk to the main thread in a safe manner using the TThreadedQueue<T> class. The same concepts are valid for a communication between two or more background threads. Getting ready Let's talk about a scenario. You have to show data generated from some sort of device or subsystem, let's say a serial, a USB device, a query polling on the database data, or a TCP socket. You cannot simply wait for data using TTimer because this would freeze your GUI during the wait, and the wait can be long. You have tried it, but your interface became sluggish… you need another solution! In the Delphi RTL, there is a very useful class called TThreadedQueue<T> that is, as the name suggests, a particular parametric queue (a FIFO data structure) that can be safely used from different threads. How to use it? In the programming field, there is mostly no single solution valid for all situations, but the following one is very popular. Feel free to change your approach if necessary. However, this is the approach used in the recipe code: Create the queue within the main form. Create a thread and inject the form queue to it. In the thread Execute method, append all generated data to the queue. In the main form, use a timer or some other mechanism to periodically read from the queue and display data on the form. How to do it… Open the recipe project called ThreadingQueueSample.dproj. This project contains the main form with all the GUI-related code and another unit with the thread code. The FormCreate event creates the shared queue with the following parameters that will influence the behavior of the queue: QueueDepth = 100: This is the maximum queue size. If the queue reaches this limit, all the push operations will be blocked for a maximum of PushTimeout, then the Push call will fail with a timeout. PushTimeout = 1000: This is the timeout in milliseconds that will affect the thread, that in this recipe is the producer of a producer/consumer pattern. PopTimeout = 1: This is the timeout in milliseconds that will affect the timer when the queue is empty. This timeout must be very short because the pop call is blocking in nature, and you are in the main thread that should never be blocked for a long time. The button labeled Start Thread creates a TReaderThread instance passing the already created queue to its constructor (this is a particular type of dependency injection called constructor injection). The thread declaration is really simple and is as follows: type TReaderThread = class(TThread) private FQueue: TThreadedQueue<Byte>; protected procedure Execute; override; public constructor Create(AQueue: TThreadedQueue<Byte>); end; While the Execute method simply appends randomly generated data to the queue, note that the Terminated property must be checked often so the application can terminate the thread and wait a reasonable time for its actual termination. In the following example, if the queue is not empty, check the termination at least every 700 msec ca: procedure TReaderThread.Execute; begin while not Terminated do begin TThread.Sleep(200 + Trunc(Random(500))); // e.g. reading from an actual device FQueue.PushItem(Random(256)); end; end; So far, you've filled the queue. Now, you have to read from the queue and do something useful with the read data. This is the job of a timer. The following is the code of the timer event on the main form: procedure TMainForm.Timer1Timer(Sender: TObject); var Value: Byte; begin while FQueue.PopItem(Value) = TWaitResult.wrSignaled do begin ListBox1.Items.Add(Format('[%3.3d]', [Value])); end; ListBox1.ItemIndex := ListBox1.Count - 1; end; That's it! Run the application and see how we are reading the data coming from the threads and showing the main form. The following is a screenshot: The main form showing data generated by the background thread There's more… The TThreadedQueue<T> is very powerful and can be used to communicate between two or more background threads in a consumer/producer schema as well. You can use multiple producers, multiple consumers, or both. The following screenshot shows a popular schema used when the speed at which the data generated is faster than the speed at which the same data is handled. In this case, usually you can gain speed on the processing side using multiple consumers. Single producer, multiple consumers Summary In this article we had a look at how to talk to the main thread using a thread-safe queue. Resources for Article: Further resources on this subject: Exploring the Usages of Delphi[article] Adding Graphics to the Map[article] Application Performance[article]

0
0
4178

article-image-practical-big-data-exploration-spark-and-python

Anant Asthana

06 Jun 2016

6 min read

Practical Big Data Exploration with Spark and Python

Anant Asthana

06 Jun 2016

6 min read

The reader of this post should be familiar with basic concepts of Spark, such as the shell and RDDs. Data sizes have increased, but our exploration tools and techniques have not evolved as fast. Traditional Hadoop Map Reduce jobs are cumbersome and time consuming to develop. Also, Pig isn't quite as fully featured and easy to work with. Exploration can mean parsing/analyzing raw text documents, analyzing log files, processing tabular data in various formats, and exploring data that may or may not be correctly formatted. This is where a tool like Spark excels. It provides an interactive shell for quick processing, prototyping, exploring, and slicing and dicing data. Spark works with R, Scala, and Python. In conjunction with Jupyter notebooks, we get a clean web interface to write out python, R, or Scala code backed by a Spark cluster. Jupyter notebook is also a great tool for presenting our findings, since we can do inline visualizations and easily share them as a PDF on GitHub or through a web viewer. The power of this set up is that we make Spark do the heavy lifting while still having the flexibility to test code on a small subset of data via the interactive notebooks. Another powerful capability of Spark is its Data Frames API. After we have cleaned our data (dealt with badly formatted rows that can't be loaded correctly), we can load it as a Data Frame. Once the data is a loaded as a Data Frame, we can use the Spark SQL to explore the data. Since notebooks can be shared, this is also a great way to let the developers do the work of cleaning the data and loading it as a Data Frame. Analysts, data scientists, and the likes can then use this data for their tasks. Data Frames can also be exported as Hive tables, which are commonly used in Hadoop-based warehouses. Examples: For this section, we will be using examples that I have uploaded on GitHub. These examples can be found at here. In addition to the examples, there is also a Docker container for running these examples that have been provided. The container runs Spark in a pseudo-distributed mode, and has Jupyter notebook configured with to run Python/PYspark. The basics: To set this up, in your environment, you need a running spark cluster with Jupyter notebook installed. Jupyter notebook, by default, only has the Python kernel configured. You can download additional kernels for Jupyter notebook to run R and Scala. To run Jupyter notebook with Pyspark, use the following command on your cluster: IPYTHON_OPTS="notebook --pylab inline --notebook-dir=<directory sto store notebooks>" MASTER=local[6] ./bin/pyspark When you start Jupyter notebook in the way we mentioned earlier, it initializes a few critical variables. One of them is the Spark Context (sc), which is used to interact with all spark-related tasks. The other is sqlContext, which is the Spark SQL context. This is used to interact with Spark SQL (create Data Frames, run queries, and so on). You need to understand the following: Log Analysis In this example, we use a log file from Apache Server. The code for this example can be found at here. We load our log file in question using: log_file = sc.textFile("../data/log_file.txt") Spark can load files from HDFS, local filesystem, and S3 natively. Other storage formats libraries can be found freely on the Internet, or you could write you own formats (Blog post for another time). The previous command loads the log file. We then use Python’s native shlex library to split the file into different fields and use the Sparks map command to load them as a Row. An RDD consisting of rows can easily be registered as a DataFrame. How we arrived at this solution is where data exploration comes in. We use the Sparks takeSample method to sample the file and get five rows: log_file.takeSample(True, 5) These sample rows are helpful in determining how to parse and load the file. Once we have written our code to load the file, we can apply it to the dataset using map to create a new RDD consisting of Rows to test code on a subset of data in a similar manner using the take or takeSample methods. The take method sequentially reads rows from the file, so although it is faster, it may not be a good representation of the dataset. The take sample method on the other hand randomly picks sample rows from the file; this has a better representation. To create the new RDD and register it as a DataFrame, we use the following code: schema_DF = splits.map(create_schema).toDF() Once we have created the DataFrame and tested it using take/takeSample to make sure that our loading code is working, we can register it as a table using the following: sqlCtx.registerDataFrameAsTable(schema_DF, 'logs') Once it is registered as a table, we can run SQL queries on the log file: sample = sqlCtx.sql('SELECT * FROM logs LIMIT 10').collect() Note that the collect() method collects the result to the driver’s memory so this may not be feasible for large datasets. Use take/takeSample instead to sample data if your dataset is large. The beauty of using Spark with Jupyter is that all this exploration work takes only a few lines of code. It can be written interactively with all the trial and error we needed, the processed data can be easily shared, and running interactive queries on this data is easy. Last but not least, this can easily scale to massive (GB, TB) data sets. k-means on the Iris dataset In this example, we use data from the Iris dataset, which contains measurements of sepal and petal length and width. This is a popular open source dataset used to showcase classification algorithms. In this case, we use Spark’s k-Means algorithm from the MLlib library of Spark. MLlib is Spark’s machine learning library. The code and the output can be found at here. In this example, we are not going to get into too much detail since some of the concepts are outside the scope of this blog post. This example showcases how we load the Iris dataset and create a DataFrame with it. We then train a k-means classifier on this dataset, and then we visualize our classification results. The power of this is that we did a somewhat complex task of parsing a dataset, creating a DataFrame, training a machine learning classifier, and visualizing the data in an interactive and scalable manner. The repository contains several more examples. Feel free to reach out to me if you have any questions. If you would like to see more posts with practical examples, please let us know. About the Author Anant Asthana is a data scientist and principal architect at Pythian, and he can be found on Github at anantasty.

0
0
3105

Packt

17 May 2016

19 min read

Expert Python Programming: Interfaces

Packt

17 May 2016

19 min read

0
0
6458

article-image-testing-your-application-cljstest

Packt

11 May 2016

13 min read

Testing Your Application with cljs.test

Packt

11 May 2016

13 min read

In this article written by David Jarvis, Rafik Naccache, and Allen Rohner, authors of the book Learning ClojureScript, we'll take a look at how to configure our ClojureScript application or library for testing. As usual, we'll start by creating a new project for us to play around with: $ lein new figwheel testing (For more resources related to this topic, see here.) We'll be playing around in a test directory. Most JVM Clojure projects will have one already, but since the default Figwheel template doesn't include a test directory, let's make one first (following the same convention used with source directories, that is, instead of src/$PROJECT_NAME we'll create test/$PROJECT_NAME): $ mkdir –p test/testing We'll now want to make sure that Figwheel knows that it has to watch the test directory for file modifications. To do that, we will edit the the dev build in our project.clj project's :cljsbuild map so that it's :source-paths vector includes both src and test. Your new dev build configuration should look like the following: {:id "dev" :source-paths ["src" "test"] ;; If no code is to be run, set :figwheel true for continued automagical reloading :figwheel {:on-jsload "testing.core/on-js-reload"} :compiler {:main testing.core :asset-path "js/compiled/out" :output-to "resources/public/js/compiled/testing.js" :output-dir "resources/public/js/compiled/out" :source-map-timestamp true}} Next, we'll get the old Figwheel REPL going so that we can have our ever familiar hot reloading: $ cd testing $ rlwrap lein figwheel Don't forget to navigate a browser window to http://localhost:3449/ to get the browser REPL to connect. Now, let's create a new core_test.cljs file in the test/testing directory. By convention, most libraries and applications in Clojure and ClojureScript have test files that correspond to source files with the suffix _test. In this project, this means that test/testing/core_test.cljs is intended to contain the tests for src/testing/core.cljs. Let's get started by just running tests on a single file. Inside core_test.cljs, let's add the following code: (ns testing.core-test (:require [cljs.test :refer-macros [deftest is]])) (deftest i-should-fail (is (= 1 0))) (deftest i-should-succeed (is (= 1 1))) This code first requires two of the most important cljs.test macros, and then gives us two simple examples of what a failed test and a successful test should look like. At this point, we can run our tests from the Figwheel REPL: cljs.user=> (require 'testing.core-test) ;; => nil cljs.user=> (cljs.test/run-tests 'testing.core-test) Testing testing.core-test FAIL in (i-should-fail) (cljs/test.js?zx=icyx7aqatbda:430:14) expected: (= 1 0) actual: (not (= 1 0)) Ran 2 tests containing 2 assertions. 1 failures, 0 errors. ;; => nil At this point, what we've got is tolerable, but it's not really practical in terms of being able to test a larger application. We don't want to have to test our application in the REPL and pass in our test namespaces one by one. The current idiomatic solution for this in ClojureScript is to write a separate test runner that is responsible for important executions and then run all of your tests. Let's take a look at what this looks like. Let's start by creating another test namespace. Let's call this one app_test.cljs, and we'll put the following in it: (ns testing.app-test (:require [cljs.test :refer-macros [deftest is]])) (deftest another-successful-test (is (= 4 (count "test")))) We will not do anything remarkable here; it's just another test namespace with a single test that should pass by itself. Let's quickly make sure that's the case at the REPL: cljs.user=> (require 'testing.app-test) nil cljs.user=> (cljs.test/run-tests 'testing.app-test) Testing testing.app-test Ran 1 tests containing 1 assertions. 0 failures, 0 errors. ;; => nil Perfect. Now, let's write a test runner. Let's open a new file that we'll simply call test_runner.cljs, and let's include the following: (ns testing.test-runner (:require [cljs.test :refer-macros [run-tests]] [testing.app-test] [testing.core-test])) ;; This isn't strictly necessary, but is a good idea depending ;; upon your application's ultimate runtime engine. (enable-console-print!) (defn run-all-tests [] (run-tests 'testing.app-test 'testing.core-test)) Again, nothing surprising. We're just making a single function for us that runs all of our tests. This is handy for us at the REPL: cljs.user=> (testing.test-runner/run-all-tests) Testing testing.app-test Testing testing.core-test FAIL in (i-should-fail) (cljs/test.js?zx=icyx7aqatbda:430:14) expected: (= 1 0) actual: (not (= 1 0)) Ran 3 tests containing 3 assertions. 1 failures, 0 errors. ;; => nil Ultimately, however, we want something we can run at the command line so that we can use it in a continuous integration environment. There are a number of ways we can go about configuring this directly, but if we're clever, we can let someone else do the heavy lifting for us. Enter doo, the handy ClojureScript testing plugin for Leiningen. Using doo for easier testing configuration doo is a library and Leiningen plugin for running cljs.test in many different JavaScript environments. It makes it easy to test your ClojureScript regardless of whether you're writing for the browser or for the server, and it also includes file watching capabilities such as Figwheel so that you can automatically rerun tests on file changes. The doo project page can be found at https://github.com/bensu/doo. To configure our project to use doo, first we need to add it to the list of plugins in our project.clj file. Modify the :plugins key so that it looks like the following: :plugins [[lein-figwheel "0.5.2"] [lein-doo "0.1.6"] [lein-cljsbuild "1.1.3" :exclusions [[org.clojure/clojure]]]] Next, we will add a new cljsbuild build configuration for our test runner. Add the following build map after the dev build map on which we've been working with until now: {:id "test" :source-paths ["src" "test"] :compiler {:main testing.test-runner :output-to "resources/public/js/compiled/testing_test.js" :optimizations :none}} This configuration tells Cljsbuild to use both our src and test directories, just like our dev profile. It adds some different configuration elements to the compiler options, however. First, we're not using testing.core as our main namespace anymore—instead, we'll use our test runner's namespace, testing.test-runner. We will also change the output JavaScript file to a different location from our compiled application code. Lastly, we will make sure that we pass in :optimizations :none so that the compiler runs quickly and doesn't have to do any magic to look things up. Note that our currently running Figwheel process won't know about the fact that we've added lein-doo to our list of plugins or that we've added a new build configuration. If you want to make Figwheel aware of doo in a way that'll allow them to play nicely together, you should also add doo as a dependency to your project. Once you've done that, exit the Figwheel process and restart it after you've saved the changes to project.clj. Lastly, we need to modify our test runner namespace so that it's compatible with doo. To do this, open test_runner.cljs and change it to the following: (ns testing.test-runner (:require [doo.runner :refer-macros [doo-tests]] [testing.app-test] [testing.core-test])) ;; This isn't strictly necessary, but is a good idea depending ;; upon your application's ultimate runtime engine. (enable-console-print!) (doo-tests 'testing.app-test 'testing.core-test) This shouldn't look too different from our original test runner—we're just importing from doo.runner rather than cljs.test and using doo-tests instead of a custom runner function. The doo-tests runner works very similarly to cljs.test/run-tests, but it places hooks around the tests to know when to start them and finish them. We're also putting this at the top-level of our namespace rather than wrapping it in a particular function. The last thing we're going to need to do is to install a JavaScript runtime that we can use to execute our tests with. Up until now, we've been using the browser via Figwheel, but ideally, we want to be able to run our tests in a headless environment as well. For this purpose. we recommend installing PhantomJS (though other execution environments are also fine). If you're on OS X and have Homebrew installed (http://www.brew.sh), installing PhantomJS is as simple as typing brew install phantomjs. If you're not on OS X or don't have Homebrew, you can find instructions on how to install PhantomJS on the project's website at http://phantomjs.org/. The key thing is that the following should work: $ phantomjs -v 2.0.0 Once you've got PhantomJS installed, you can now invoke your test runner from the command line with the following: $ lein doo phantom test once ;; ====================================================================== ;; Testing with Phantom: Testing testing.app-test Testing testing.core-test FAIL in (i-should-fail) (:) expected: (= 1 0) actual: (not (= 1 0)) Ran 3 tests containing 3 assertions. 1 failures, 0 errors. Subprocess failed Let's break down this command. The first part, lein doo, just tells Leiningen to invoke the doo plugin. Next, we have phantom, which tells doo to use PhantomJS as its running environment. The doo plugin supports a number of other environments, including Chrome, Firefox, Internet Explorer, Safari, Opera, SlimerJS, NodeJS, Rhino, and Nashorn. Be aware that if you're interested in running doo on one of these other environments, you may have to configure and install additional software. For instance, if you want to run tests on Chrome, you'll need to install Karma as well as the appropriate Karma npm modules to enable Chrome interaction. Next we have test, which refers to the cljsbuild build ID we set up earlier. Lastly, we have once, which tells doo to just run tests and not to set up a filesystem watcher. If, instead, we wanted doo to watch the filesystem and rerun tests on any changes, we would just use lein doo phantom test. Testing fixtures The cljs.test project has support for adding fixtures to your tests that can run before and after your tests. Test fixtures are useful for establishing isolated states between tests—for instance, you can use tests to set up a specific database state before each test and to tear it down afterward. You can add them to your ClojureScript tests by declaring them with the use-fixtures macro within the testing namespace you want fixtures applied to. Let's see what this looks like in practice by changing one of our existing tests and adding some fixtures to it. Modify app-test.cljs to the following: (ns testing.app-test (:require [cljs.test :refer-macros [deftest is use-fixtures]])) ;; Run these fixtures for each test. ;; We could also use :once instead of :each in order to run ;; fixtures once for the entire namespace instead of once for ;; each individual test. (use-fixtures :each {:before (fn [] (println "Setting up tests...")) :after (fn [] (println "Tearing down tests..."))}) (deftest another-successful-test ;; Give us an idea of when this test actually executes. (println "Running a test...") (is (= 4 (count "test")))) Here, we've added a call to use-fixtures that prints to the console before and after running the test, and we've added a println call to the test itself so that we know when it executes. Now when we run this test, we get the following: $ lein doo phantom test once ;; ====================================================================== ;; Testing with Phantom: Testing testing.app-test Setting up tests... Running a test... Tearing down tests... Testing testing.core-test FAIL in (i-should-fail) (:) expected: (= 1 0) actual: (not (= 1 0)) Ran 3 tests containing 3 assertions. 1 failures, 0 errors. Subprocess failed Note that our fixtures get called in the order we expect them to. Asynchronous testing Due to the fact that client-side code is frequently asynchronous and JavaScript is single threaded, we need to have a way to support asynchronous tests. To do this, we can use the async macro from cljs.test. Let's take a look at an example using an asynchronous HTTP GET request. First, let's modify our project.clj file to add cljs-ajax to our dependencies. Our dependencies project key should now look something like this: :dependencies [[org.clojure/clojure "1.8.0"] [org.clojure/clojurescript "1.7.228"] [cljs-ajax "0.5.4"] [org.clojure/core.async "0.2.374" :exclusions [org.clojure/tools.reader]]] Next, let's create a new async_test.cljs file in our test.testing directory. Inside it, we will add the following code: (ns testing.async-test (:require [ajax.core :refer [GET]] [cljs.test :refer-macros [deftest is async]])) (deftest test-async (GET "http://www.google.com" ;; will always fail from PhantomJS because ;; `Access-Control-Allow-Origin` won't allow ;; our headless browser to make requests to Google. {:error-handler (fn [res] (is (= (:status-text res) "Request failed.")) (println "Test finished!"))})) Note that we're not using async in our test at the moment. Let's try running this test with doo (don't forget that you have to add testing.async-test to test_runner.cljs!): $ lein doo phantom test once ... Testing testing.async-test ... Ran 4 tests containing 3 assertions. 1 failures, 0 errors. Subprocess failed Now, our test here passes, but note that the println async code never fires, and our additional assertion doesn't get called (looking back at our previous examples, since we've added a new is assertion we should expect to see four assertions in the final summary)! If we actually want our test to appropriately validate the error-handler callback within the context of the test, we need to wrap it in an async block. Doing so gives us a test that looks like the following: (deftest test-async (async done (GET "http://www.google.com" ;; will always fail from PhantomJS because ;; `Access-Control-Allow-Origin` won't allow ;; our headless browser to make requests to Google. {:error-handler (fn [res] (is (= (:status-text res) "Request failed.")) (println "Test finished!") (done))}))) Now, let's try to run our tests again: $ lein doo phantom test once ... Testing testing.async-test Test finished! ... Ran 4 tests containing 4 assertions. 1 failures, 0 errors. Subprocess failed Awesome! Note that this time we see the printed statement from our callback, and we can see that cljs.test properly ran all four of our assertions. Asynchronous fixtures One final "gotcha" on testing—the fixtures we talked about earlier in this article do not handle asynchronous code automatically. This means that if you have a :before fixture that executes asynchronous logic, your test can begin running before your fixture has completed! In order to get around this, all you need to do is to wrap your :before fixture in an async block, just like with asynchronous tests. Consider the following for instance: (use-fixtures :once {:before #(async done ... (done)) :after #(do ...)}) Summary This concludes our section on cljs.test. Testing, whether in ClojureScript or any other language, is a critical software engineering best practice to ensure that your application behaves the way you expect it to and to protect you and your fellow developers from accidentally introducing bugs to your application. With cljs.test and doo, you have the power and flexibility to test your ClojureScript application with multiple browsers and JavaScript environments and to integrate your tests into a larger continuous testing framework. Resources for Article: Further resources on this subject: Clojure for Domain-specific Languages - Design Concepts with Clojure [article] Visualizing my Social Graph with d3.js [article] Improving Performance with Parallel Programming [article]

0
0
4613

Packt

19 Feb 2016

8 min read

How is Python code organized

Packt

19 Feb 2016

8 min read

Python is an easy to learn yet a powerful programming language. It has efficient high-level data structures and effective approach to object-oriented programming. Let's talk a little bit about how Python code is organized. In this paragraph, we'll start going down the rabbit hole a little bit more and introduce a bit more technical names and concepts. Starting with the basics, how is Python code organized? Of course, you write your code into files. When you save a file with the extension .py, that file is said to be a Python module. If you're on Windows or Mac, which typically hide file extensions to the user, please make sure you change the configuration so that you can see the complete name of the files. This is not strictly a requirement, but a hearty suggestion. It would be impractical to save all the code that it is required for software to work within one single file. That solution works for scripts, which are usually not longer than a few hundred lines (and often they are quite shorter than that). A complete Python application can be made of hundreds of thousands of lines of code, so you will have to scatter it through different modules. Better, but not nearly good enough. It turns out that even like this it would still be impractical to work with the code. So Python gives you another structure, called package, which allows you to group modules together. A package is nothing more than a folder, which must contain a special file, __init__.py that doesn't need to hold any code but whose presence is required to tell Python that the folder is not just some folder, but it's actually a package (note that as of Python 3.3 __init__.py is not strictly required any more). As always, an example will make all of this much clearer. I have created an example structure in my project, and when I type in my Linux console: $ tree -v example Here's how a structure of a real simple application could look like: example/ ├── core.py ├── run.py └── util ├── __init__.py ├── db.py ├── math.py └── network.py You can see that within the root of this example, we have two modules, core.py and run.py, and one package: util. Within core.py, there may be the core logic of our application. On the other hand, within the run.py module, we can probably find the logic to start the application. Within the util package, I expect to find various utility tools, and in fact, we can guess that the modules there are called by the type of tools they hold: db.py would hold tools to work with databases, math.py would of course hold mathematical tools (maybe our application deals with financial data), and network.py would probably hold tools to send/receive data on networks. As explained before, the __init__.py file is there just to tell Python that util is a package and not just a mere folder. Had this software been organized within modules only, it would have been much harder to infer its structure. I put a module only example under the ch1/files_only folder, see it for yourself: $ tree -v files_only This shows us a completely different picture: files_only/ ├── core.py ├── db.py ├── math.py ├── network.py └── run.py It is a little harder to guess what each module does, right? Now, consider that this is just a simple example, so you can guess how much harder it would be to understand a real application if we couldn't organize the code in packages and modules. How do we use modules and packages When a developer is writing an application, it is very likely that they will need to apply the same piece of logic in different parts of it. For example, when writing a parser for the data that comes from a form that a user can fill in a web page, the application will have to validate whether a certain field is holding a number or not. Regardless of how the logic for this kind of validation is written, it's very likely that it will be needed in more than one place. For example in a poll application, where the user is asked many question, it's likely that several of them will require a numeric answer. For example: What is your age How many pets do you own How many children do you have How many times have you been married It would be very bad practice to copy paste (or, more properly said: duplicate) the validation logic in every place where we expect a numeric answer. This would violate the DRY (Don't Repeat Yourself) principle, which states that you should never repeat the same piece of code more than once in your application. I feel the need to stress the importance of this principle: you should never repeat the same piece of code more than once in your application (got the irony?). There are several reasons why repeating the same piece of logic can be very bad, the most important ones being: There could be a bug in the logic, and therefore, you would have to correct it in every place that logic is applied. You may want to amend the way you carry out the validation, and again you would have to change it in every place it is applied. You may forget to fix/amend a piece of logic because you missed it when searching for all its occurrences. This would leave wrong/inconsistent behavior in your application. Your code would be longer than needed, for no good reason. Python is a wonderful language and provides you with all the tools you need to apply all the coding best practices. For this particular example, we need to be able to reuse a piece of code. To be able to reuse a piece of code, we need to have a construct that will hold the code for us so that we can call that construct every time we need to repeat the logic inside it. That construct exists, and it's called function. I'm not going too deep into the specifics here, so please just remember that a function is a block of organized, reusable code which is used to perform a task. Functions can assume many forms and names, according to what kind of environment they belong to, but for now this is not important. Functions are the building blocks of modularity in your application, and they are almost indispensable (unless you're writing a super simple script, you'll use functions all the time). Python comes with a very extensive library, as I already said a few pages ago. Now, maybe it's a good time to define what a library is: a library is a collection of functions and objects that provide functionalities that enrich the abilities of a language. For example, within Python's math library we can find a plethora of functions, one of which is the factorial function, which of course calculates the factorial of a number. In mathematics, the factorial of a non-negative integer number N, denoted as N!, is defined as the product of all positive integers less than or equal to N. For example, the factorial of 5 is calculated as: 5! = 5 * 4 * 3 * 2 * 1 = 120 The factorial of 0 is 0! = 1, to respect the convention for an empty product. So, if you wanted to use this function in your code, all you would have to do is to import it and call it with the right input values. Don't worry too much if input values and the concept of calling is not very clear for now, please just concentrate on the import part. We use a library by importing what we need from it, and then we use it. In Python, to calculate the factorial of number 5, we just need the following code: >>> from math import factorial >>> factorial(5) 120 Whatever we type in the shell, if it has a printable representation, will be printed on the console for us (in this case, the result of the function call: 120). So, let's go back to our example, the one with core.py, run.py, util, and so on. In our example, the package util is our utility library. Our custom utility belt that holds all those reusable tools (that is, functions), which we need in our application. Some of them will deal with databases (db.py), some with the network (network.py), and some will perform mathematical calculations (math.py) that are outside the scope of Python's standard math library and therefore, we had to code them for ourselves. Summary In this article, we started to explore the world of programming and that of Python. We saw how Python code can be organized using modules and packages. For more information on Python, refer the following books recomended by Packt Publishing: Learning Python (https://www.packtpub.com/application-development/learning-python) Python 3 Object-oriented Programming - Second Edition (https://www.packtpub.com/application-development/python-3-object-oriented-programming-second-edition) Python Essentials (https://www.packtpub.com/application-development/python-essentials) Resources for Article: Further resources on this subject: Test all the things with Python [article] Putting the Fun in Functional Python [article] Scraping the Web with Python - Quick Start [article]

0
0
3860

Packt

16 Feb 2016

19 min read

Unlocking the JavaScript Core

Packt

16 Feb 2016

19 min read

You may have owned an iPhone for years and regard yourself as an experienced user. At the same time, you keep removing unwanted characters one at a time while typing by pressing delete. However, one day you find out that a quick shake allows you to delete the whole message in one tap. Then you wonder why on earth you didn't know this earlier. The same thing happens with programming. We can be quite satisfied with our coding until, all of sudden, we run into a trick or a lesser-known language feature that makes us reconsider the entire work done over the years. It turns out that we could do this in a cleaner, more readable, more testable, and more maintainable way. So it's presumed that you already have experience with JavaScript; however, this article equips you with the best practices to improve your code. (For more resources related to this topic, see here.) We will cover the following topics: Making your code readable and expressive Mastering multiline strings in JavaScript Manipulating arrays in the ES5 way Traversing an object in an elegant, reliable, safe, and fast way The most effective way of declaring objects How to magic methods in JavaScript Make your code readable and expressive There are numerous practices and heuristics to make a code more readable, expressive, and clean. We will cover this topic later on, but here we will talk about syntactic sugar. The term means an alternative syntax that makes the code more expressive and readable. In fact, we already had some of this in JavaScript from the very beginning. For instance, the increment/decrement and addition/subtraction assignment operators inherited from C. foo++ is syntactic sugar for foo = foo + 1, and foo += bar is a shorter form for foo = foo + bar. Besides, we have a few tricks that serve the same purpose. JavaScript applies logical expressions to so-called short-circuit evaluation. This means that an expression is read left to right, but as soon as the condition result is determined at an early stage, the expression tail is not evaluated. If we have true || false || false, the interpreter will know from the first test that the result is true regardless of other tests. So the false || false part is not evaluated, and this opens a way for creativity. Function argument default value When we need to specify default values for parameters we can do like that: function stub( foo ) { return foo || "Default value"; } console.log( stub( "My value" ) ); // My value console.log( stub() ); // Default value What is going on here? When foo is true (not undefined, NaN, null, false, 0, or ""), the result of the logical expression is foo otherwise the expression is evaluated until Default value and this is the final result. Starting with 6th edition of EcmaScript (specification of JavaScript language) we can use nicer syntax: function stub( foo = "Default value" ) { return foo; } Conditional invocation While composing our code we shorten it on conditions:" var age = 20; age >= 18 && console.log( "You are allowed to play this game" ); age >= 18 || console.log( "The game is restricted to 18 and over" ); In the preceding example, we used the AND (&&) operator to invoke console.log if the left-hand condition is Truthy. The OR (||) operator does the opposite, it calls console.log if the condition is Falsy. I think the most common case in practice is the shorthand condition where the function is called only when it is provided: /** * @param {Function} [cb] - callback */ function fn( cb ) { cb && cb(); }; The following is one more example on this: /** * @class AbstractFoo */ AbstractFoo = function(){ // call this.init if the subclass has init method this.init && this.init(); }; Syntactic sugar was introduced to its full extent to the JavaScript world only with the advance in CoffeeScript, a subset of the language that trans-compiles (compiles source-to-source) into JavaScript. Actually CoffeeScript, inspired by Ruby, Python, and Haskell, has unlocked arrow-functions, spreads, and other syntax to JavaScript developers. In 2011, Brendan Eich (the author of JavaScript) admitted that CoffeeScript influenced him in his work on EcmaScript Harmony, which was finalized this summer in ECMA-262 6th edition specification. From a marketing perspective, the specification writers agreed on using a new name convention that calls the 6th edition as EcmaScript 2015 and the 7th edition as EcmaScript 2016. Yet the community is used to abbreviations such as ES6 and ES7. To avoid confusion further in the book, we will refer to the specifications by these names. Now we can look at how this affects the new JavaScript. Arrow functions Traditional function expression may look like this: function( param1, param2 ){ /* function body */ } When declaring an expression using the arrow function (aka fat arrow function) syntax, we will have this in a less verbose form, as shown in the following: ( param1, param2 ) => { /* function body */ } In my opinion, we don't gain much with this. But if we need, let's say, an array method callback, the traditional form would be as follows: function( param1, param2 ){ return expression; } Now the equivalent arrow function becomes shorter, as shown here: ( param1, param2 ) => expression We may do filtering in an array this way: // filter all the array elements greater than 2 var res = [ 1, 2, 3, 4 ].filter(function( v ){ return v > 2; }) console.log( res ); // [3,4] Using an array function, we can do filtering in a cleaner form: var res = [ 1, 2, 3, 4 ].filter( v => v > 2 ); console.log( res ); // [3,4] Besides shorter function declaration syntax, the arrow functions bring the so called lexical this. Instead of creating its own context, it uses the context of the surrounding object as shown here: "use strict"; /** * @class View */ let View = function(){ let button = document.querySelector( "[data-bind="btn"]" ); /** * Handle button clicked event * @private */ this.onClick = function(){ console.log( "Button clicked" ); }; button.addEventListener( "click", () => { // we can safely refer surrounding object members this.onClick(); }, false ); } In the preceding example, we subscribed a handler function to a DOM event (click). Within the scope of the handler, we still have access to the view context (this), so we don't need to bind the handler to the outer scope or pass it as a variable through the closure: var that = this; button.addEventListener( "click", function(){ // cross-cutting concerns that.onClick(); }, false ); Method definitions As mentioned in the preceding section, arrow functions can be quite handy when declaring small inline callbacks, but always applying it for a shorter syntax is controversial. However, ES6 provides new alternative method definition syntax besides the arrow functions. The old-school method declaration may look as follows: var foo = { bar: function( param1, param2 ) { } } In ES6 we can get rid of the function keyword and the colon. So the preceding code can be put this way: let foo = { bar ( param1, param2 ) { } } The rest operator Another syntax structure that was borrowed from CoffeeScript came to JavaScript as the rest operator (albeit, the approach is called splats in CoffeeScript). When we had a few mandatory function parameters and an unknown number of rest parameters, we used to do something like this: "use strict"; var cb = function() { // all available parameters into an array var args = [].slice.call( arguments ), // the first array element to foo and shift foo = args.shift(), // the new first array element to bar and shift bar = args.shift(); console.log( foo, bar, args ); }; cb( "foo", "bar", 1, 2, 3 ); // foo bar [1, 2, 3] Now check out how expressive this code becomes in ES6: let cb = function( foo, bar, ...args ) { console.log( foo, bar, args ); } cb( "foo", "bar", 1, 2, 3 ); // foo bar [1, 2, 3] Function parameters aren't the only application of the rest operator. For example, we can use it in destructions as well, as follows: let [ bar, ...others ] = [ "bar", "foo", "baz", "qux" ]; console.log([ bar, others ]); // ["bar",["foo","baz","qux"]] The spread operator Similarly, we can spread array elements into arguments: let args = [ 2015, 6, 17 ], relDate = new Date( ...args ); console.log( relDate.toString() ); // Fri Jul 17 2015 00:00:00 GMT+0200 (CEST) Mastering multiline strings in JavaScript Multi-line strings aren't a good part of JavaScript. While they are easy to declare in other languages (for instance, NOWDOC), you cannot just keep single-quoted or double-quoted strings in multiple lines. This will lead to syntax error as every line in JavaScript is considered as a possible command. You can set backslashes to show your intention: var str = "Lorem ipsum dolor sit amet, n consectetur adipiscing elit. Nunc ornare, n diam ultricies vehicula aliquam, mauris n ipsum dapibus dolor, quis fringilla leo ligula non neque"; This kind of works. However, as soon as you miss a trailing space, you get a syntax error, which is not easy to spot. While most script agents support this syntax, it's, however, not a part of the EcmaScript specification. In the times of EcmaScript for XML (E4X), we could assign a pure XML to a string, which opened a way for declarations such as these: var str = <>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nunc ornare </>.toString(); Nowadays E4X is deprecated, it's not supported anymore. Concatenation versus array join We can also use string concatenation. It may feel clumsy, but it's safe: var str = "Lorem ipsum dolor sit amet, n" + "consectetur adipiscing elit. Nunc ornare,n" + "diam ultricies vehicula aliquam, mauris n" + "ipsum dapibus dolor, quis fringilla leo ligula non neque"; You may be surprised, but concatenation is slower than array joining. So the following technique will work faster: var str = [ "Lorem ipsum dolor sit amet, n", "consectetur adipiscing elit. Nunc ornare,n", "diam ultricies vehicula aliquam, mauris n", "ipsum dapibus dolor, quis fringilla leo ligula non neque"].join( "" ); Template literal What about ES6? The latest EcmaScript specification introduces a new sort of string literal, template literal: var str = `Lorem ipsum dolor sit amet, n consectetur adipiscing elit. Nunc ornare, n diam ultricies vehicula aliquam, mauris n ipsum dapibus dolor, quis fringilla leo ligula non neque`; Now the syntax looks elegant. But there is more. Template literals really remind us of NOWDOC. You can refer any variable declared in the scope within the string: "use strict"; var title = "Some title", text = "Some text", str = `<div class="message"> <h2>${title}</h2> <article>${text}</article> </div>`; console.log( str ); The output is as follows: <div class="message"> <h2>Some title</h2> <article>Some text</article> </div> If you wonder when can you safely use this syntax, I have a good news for you—this feature is already supported by (almost) all the major script agents (http://kangax.github.io/compat-table/es6/). Multi-line strings via transpilers With the advance of ReactJS, Facebook's EcmaScript language extension named JSX (https://facebook.github.io/jsx/) is now really gaining momentum. Apparently influenced by previously mentioned E4X, they proposed a kind of string literal for XML-like content without any screening at all. This type supports template interpolation similar to ES6 templates: "use strict"; var Hello = React.createClass({ render: function() { return <div class="message"> <h2>{this.props.title}</h2> <article>{this.props.text}</article> </div>; } }); React.render(<Hello title="Some title" text="Some text" />, node); Another way to declare multiline strings is by using CommonJS Compiler (http://dsheiko.github.io/cjsc/). While resolving the 'require' dependencies, the compiler transforms any content that is not .js/.json content into a single-line string: foo.txt Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nunc ornare, diam ultricies vehicula aliquam, mauris ipsum dapibus dolor, quis fringilla leo ligula non neque consumer.js var str = require( "./foo.txt" ); console.log( str ); Manipulating arrays in the ES5 way Some years ago when the support of ES5 features was poor (EcmaScript 5th edition was finalized in 2009), libraries such as Underscore and Lo-Dash got highly popular as they provided a comprehensive set of utilities to deal with arrays/collections. Today, many developers still use third-party libraries (including jQuery/Zepro) for methods such as map, filter, every, some, reduce, and indexOf, while these are available in the native form of JavaScript. It still depends on how you use such libraries, but it may likely happen that you don't need them anymore. Let's see what we have now in JavaScript. Array methods in ES5 Array.prototype.forEach is probably the most used method of the arrays. That is, it is the native implementation of _.each, or for example, of the $.each utilities. As parameters, forEach expects an iteratee callback function and optionally a context in which you want to execute the callback. It passes to the callback function an element value, an index, and the entire array. The same parameter syntax is used for most array manipulation methods. Note that jQuery's $.each has the inverted callback parameters order: "use strict"; var data = [ "bar", "foo", "baz", "qux" ]; data.forEach(function( val, inx ){ console.log( val, inx ); }); Array.prototype.map produces a new array by transforming the elements of a given array: "use strict"; var data = { bar: "bar bar", foo: "foo foo" }, // convert key-value array into url-encoded string urlEncStr = Object.keys( data ).map(function( key ){ return key + "=" + window.encodeURIComponent( data[ key ] ); }).join( "&" ); console.log( urlEncStr ); // bar=bar%20bar&foo=foo%20foo Array.prototype.filter returns an array, which consists of given array values that meet the callback's condition: "use strict"; var data = [ "bar", "foo", "", 0 ], // remove all falsy elements filtered = data.filter(function( item ){ return !!item; }); console.log( filtered ); // ["bar", "foo"] Array.prototype.reduce/Array.prototype.reduceRight retrieves the product of values in an array. The method expects a callback function and optionally the initial value as arguments. The callback function receive four parameters: the accumulative value, current one, index and original array. So we can, for an instance, increment the accumulative value by the current one (return acc += cur;) and, thus, we will get the sum of array values. Besides calculating with these methods, we can concatenate string values or arrays: "use strict"; var data = [[ 0, 1 ], [ 2, 3 ], [ 4, 5 ]], arr = data.reduce(function( prev, cur ) { return prev.concat( cur ); }), arrReverse = data.reduceRight(function( prev, cur ) { return prev.concat( cur ); }); console.log( arr ); // [0, 1, 2, 3, 4, 5] console.log( arrReverse ); // [4, 5, 2, 3, 0, 1] Array.prototype.some tests whether any (or some) values of a given array meet the callback condition: "use strict"; var bar = [ "bar", "baz", "qux" ], foo = [ "foo", "baz", "qux" ], /** * Check if a given context (this) contains the value * @param {*} val * @return {Boolean} */ compare = function( val ){ return this.indexOf( val ) !== -1; }; console.log( bar.some( compare, foo ) ); // true In this example, we checked whether any of the bar array values are available in the foo array. For testability, we need to pass a reference of the foo array into the callback. Here we inject it as context. If we need to pass more references, we would push them in a key-value object. As you probably noticed, we used in this example Array.prototype.indexOf. The method works the same as String.prototype.indexOf. This returns an index of the match found or -1. Array.prototype.every tests whether every value of a given array meets the callback condition: "use strict"; var bar = [ "bar", "baz" ], foo = [ "bar", "baz", "qux" ], /** * Check if a given context (this) contains the value * @param {*} val * @return {Boolean} */ compare = function( val ){ return this.indexOf( val ) !== -1; }; console.log( bar.every( compare, foo ) ); // true If you are still concerned about support for these methods in a legacy browser as old as IE6-7, you can simply shim them with https://github.com/es-shims/es5-shim. Array methods in ES6 In ES6, we get just a few new methods that look rather like shortcuts over the existing functionality. Array.prototype.fill populates an array with a given value, as follows: "use strict"; var data = Array( 5 ); console.log( data.fill( "bar" ) ); // ["bar", "bar", "bar", "bar", "bar"] Array.prototype.includes explicitly checks whether a given value exists in the array. Well, it is the same as arr.indexOf( val ) !== -1, as shown here: "use strict"; var data = [ "bar", "foo", "baz", "qux" ]; console.log( data.includes( "foo" ) ); Array.prototype.find filters out a single value matching the callback condition. Again, it's what we can get with Array.prototype.filter. The only difference is that the filter method returns either an array or a null value. In this case, this returns a single element array, as follows: "use strict"; var data = [ "bar", "fo", "baz", "qux" ], match = function( val ){ return val.length < 3; }; console.log( data.find( match ) ); // fo Traversing an object in an elegant, reliable, safe, and fast way It is a common case when we have a key-value object (let's say options) and need to iterate it. There is an academic way to do this, as shown in the following code: "use strict"; var options = { bar: "bar", foo: "foo" }, key; for( key in options ) { console.log( key, options[ key] ); } The preceding code outputs the following: bar bar foo foo Now let's imagine that any of the third-party libraries that you load in the document augments the built-in Object: Object.prototype.baz = "baz"; Now when we run our example code, we will get an extra undesired entry: bar bar foo foo baz baz The solution to this problem is well known, we have to test the keys with the Object.prototype.hasOwnProperty method: //… for( key in options ) { if ( options.hasOwnProperty( key ) ) { console.log( key, options[ key] ); } } Iterating the key-value object safely and fast Let's face the truth—the structure is clumsy and requires optimization (we have to perform the hasOwnProperty test on every given key). Luckily, JavaScript has the Object.keys method that retrieves all string-valued keys of all enumerable own (non-inherited) properties. This gives us the desired keys as an array that we can iterate, for instance, with Array.prototype.forEach: "use strict"; var options = { bar: "bar", foo: "foo" }; Object.keys( options ).forEach(function( key ){ console.log( key, options[ key] ); }); Besides the elegance, we get a better performance this way. In order to see how much we gain, you can run this online test in distinct browsers such as: http://codepen.io/dsheiko/pen/JdrqXa. Enumerating an array-like object Objects such as arguments and nodeList (node.querySelectorAll, document.forms) look like arrays, in fact they are not. Similar to arrays, they have the length property and can be iterated in the for loop. In the form of objects, they can be traversed in the same way that we previously examined. But they do not have any of the array manipulation methods (forEach, map, filter, some and so on). The thing is we can easily convert them into arrays as shown here: "use strict"; var nodes = document.querySelectorAll( "div" ), arr = Array.prototype.slice.call( nodes ); arr.forEach(function(i){ console.log(i); }); The preceding code can be even shorter: arr = [].slice.call( nodes ) It's a pretty convenient solution, but looks like a trick. In ES6, we can do the same conversion with a dedicated method: arr = Array.from( nodes ); The collections of ES6 ES6 introduces a new type of objects—iterable objects. These are the objects whose elements can be retrieved one at a time. They are quite the same as iterators in other languages. Beside arrays, JavaScript received two new iterable data structures, Set and Map. Set which are a collection of unique values: "use strict"; let foo = new Set(); foo.add( 1 ); foo.add( 1 ); foo.add( 2 ); console.log( Array.from( foo ) ); // [ 1, 2 ] let foo = new Set(), bar = function(){ return "bar"; }; foo.add( bar ); console.log( foo.has( bar ) ); // true The map is similar to a key-value object, but may have arbitrary values for the keys. And this makes a difference. Imagine that we need to write an element wrapper that provides jQuery-like events API. By using the on method, we can pass not only a handler callback function but also a context (this). We bind the given callback to the cb.bind( context ) context. This means addEventListener receives a function reference different from the callback. How do we unsubscribe the handler then? We can store the new reference in Map by a key composed from an event name and a callback function reference: "use strict"; /** * @class * @param {Node} el */ let El = function( el ){ this.el = el; this.map = new Map(); }; /** * Subscribe a handler on event * @param {String} event * @param {Function} cb * @param {Object} context */ El.prototype.on = function( event, cb, context ){ let handler = cb.bind( context || this ); this.map.set( [ event, cb ], handler ); this.el.addEventListener( event, handler, false ); }; /** * Unsubscribe a handler on event * @param {String} event * @param {Function} cb */ El.prototype.off = function( event, cb ){ let handler = cb.bind( context ), key = [ event, handler ]; if ( this.map.has( key ) ) { this.el.removeEventListener( event, this.map.get( key ) ); this.map.delete( key ); } }; Any iterable object has methods, keys, values, and entries, where the keys work the same as Object.keys and the others return array values and an array of key-value pairs respectively. Now let's see how we can traverse the iterable objects: "use strict"; let map = new Map() .set( "bar", "bar" ) .set( "foo", "foo" ), pair; for ( pair of map ) { console.log( pair ); } // OR let map = new Map([ [ "bar", "bar" ], [ "foo", "foo" ], ]); map.forEach(function( value, key ){ console.log( key, value ); }); Iterable objects have manipulation methods such as arrays. So we can use forEach. Besides, they can be iterated by for...in and for...of loops. The first one retrieves indexes and the second, the values. Summary This article gives practices and tricks on how to use the JavaScript core features for the maximum effect. This article also discusses the techniques to improve the expressiveness of the code, to master multi-line strings and templating, and to manipulate arrays and array-like objects. Further, we are introduced to the "magic methods" of JavaScript and gives a practical example of their use. JavaScript was born as a scripting language at the most inappropriate time—the time of browser wars. It was neglected and misunderstood for a decade and endured six editions. And look at it now! JavaScript has become a mainstream programming language. You can learn more about JavaScript with the help of the following books: https://www.packtpub.com/application-development/mastering-javascript-design-patterns https://www.packtpub.com/application-development/javascript-promises-essentials https://www.packtpub.com/application-development/learning-javascript-data-structures-and-algorithms Resources for Article: Further resources on this subject: Using JavaScript with HTML[article] Dart with JavaScript[article] Creating a basic JavaScript plugin[article]

0
0
1305

article-image-modular-programming-ecmascript-6

Packt

15 Feb 2016

18 min read

Modular Programming in ECMAScript 6

Packt

15 Feb 2016

18 min read

0
0
4820

article-image-unleashing-creativity-mit-app-inventor-2

Packt

14 Jan 2016

9 min read

Unleashing Creativity with MIT App Inventor 2

Packt

14 Jan 2016

9 min read

This article by Felicia Kamriani and Dr. Krishnendu Roy, authors of the book App Inventor 2 Essentials, covers what is MIT App Inventor 2 and why you should learn to use it? There are apps for just about everything—entertainment, socializing, dining, traveling, philanthropy, shopping, education, navigation, and so on. And just about everyone with a smartphone or tablet is using them to make their lives easier or better. But you have decided to move from just using mobile apps to creating mobile apps. Congratulations! Thanks to MIT App Inventor 2, mobile app development is no longer exclusively the realm of experienced software programmers. It empowers anyone with an idea to create technology. This book offers people of all ages a step-by-step guide to creating mobile apps with App Inventor. While this visual programming language is an ideal tool for people who have little or no coding experience, don’t be fooled into thinking that the capabilities of MIT App Inventor 2 are basic! The simple drag-and-drop blocks format is actually a powerful software language capable of creating complex and sophisticated mobile apps. The purpose of this article is to provide an overview of MIT App Inventor 2, and of your new role as a mobile app developer. You are in for more skill building than you ever imagined! Of course, you will learn to code mobile apps, but there are countless other valuable skills weaved into the mobile app building process. Most significantly, you will learn to think differently, master the design-thinking process, become a problem solver, and be resourceful. This article also offers tips on brainstorming app ideas and design principles. Lastly, it reveals the potential of MIT App Inventor 2 and showcases an array of mobile apps so that you, a budding app designer, can begin thinking about the full spectrum of possibilities. (For more resources related to this topic, see here.) What is MIT App Inventor 2? MIT App Inventor 2 is a free, drag-and-drop, blocks-based visual programming language that enables people, regardless of coding experience, to create mobile apps for Android devices. In 2008, iPhones and Android phones had just hit the market and MIT professor Hal Abelson had the idea to create an easy-to-use programming language to make mobile apps that would harness the power of the emerging smartphone technology. Equipped with fast processors, large memory storage, and sensors, smartphones were enabling people to monitor and interact with their environment like never before. Abelson’s goal was to democratize the mobile app development process by making it easy for anyone to create mobile apps that were meaningful and important to them. While on sabbatical at Google in Mountain View, CA, Abelson worked with engineer Mark Friedman to create App Inventor (yes, it was originally called Google App Inventor!). In 2011, Abelson brought App Inventor to MIT and together with the Media Lab and CSAIL (the Computer Science and Artificial Intelligence Lab) created the Center for Mobile Learning. In December 2013, Abelson and his team of developers launched MIT App Inventor 2, (from here on referred to as MIT App Inventor) an even easier to use web-based application version with an Integrated Development Environment (IDE). This means that you can see your app come to life on your smartphone as you are building it. All you need is a computer (Mac or PC), an internet connection (or a USB connection), a Google Gmail account and an Android device (phone or tablet). But, if you don’t have an Android device, don’t worry! You can still create apps with the on-screen Emulator. The MIT App Inventor (http://appinventor.mit.edu/) browser includes a Designer Screen, a graphical user interface (GUI) where you create the look and feel of the app (choosing which components you want it to include) and the Blocks Editor, where you add behavior to the app by coding with colorful blocks. Users build apps by dragging components and blocks from menu bars onto a workspace and a connected Android device (or Emulator) displays progress in real time. All the apps are saved on the MIT server and once completed, they can be can be shared on the MIT App Inventor Gallery, submitted to app contests (such as MIT App of the Month) or uploaded to the Google Play store (or other app marketplaces) for sharing or selling. To date, MIT App Inventor has empowered millions of people to become creators of technology by learning to be mobile app developers. And now, you will become one of them! Understanding your role as a mobile app developer Since you are reading this book, it is safe to assume that not only do you regularly use mobile apps, but on occasion, you have also had the thought, “I wish there were an app for that!” Now, with the help of MIT App Inventor and this guidebook to mobile app development, you will soon be able to say, “I can create an app for that!” In embracing your new role as a mobile app developer, you will not just be learning how to code, but you will also learn an array of other valuable skills. You will learn to think differently. Every time you open an app, you will start looking at it from the developer’s perspective rather than just as a user. You will start noticing what functions are logical and smooth and which are choppy and unintuitive. You will learn to get inspiration from your environment. What type of app could make the attendance process at my club/class/meeting more streamlined or efficient? What app idea could help solve the problem of inaccurate inventory at the gym? You will learn to become a data gatherer without even realizing it. When people make comments about apps, your ears will perk up and you will take note. You will start asking questions like why do you prefer Waze to Google Maps? You will learn to think logically so that you can tell the computer in a step-by-step manner how to perform an operation. You will learn to become a problem solver. Any coder will confirm that programming is an iterative process. It’s a continual cycle of coding, troubleshooting and debugging. Trial and error will become second nature, as will taking a step back to figure out why something that just worked a minute ago now seems broken. And, you will learn to assume the role of a designer. It is no longer accurate to merely depict programmers holed up by themselves at a computer creating white text-based code on black screens. Coders of mobile apps are also designers who think about and create attractive and intuitive user interfaces (UIs). Much of the design work happens not at the computer—it includes conversations with potential users, involves pens paper, and post-it notes, and uses story-boards or sketches. Only once you have your app designed on paper do you sit down at the computer to begin coding. And then, you will not find the traditional black and white interface, as the MIT App Inventor platform is interactive and full of colorful blocks that snap together. Brainstorming app ideas Chances are you already have an idea for a mobile app. But if not, how can you think of one? Sometimes, you have so many ideas; it’s hard to narrow them down to just one. The best way to start brainstorming app ideas is by starting with what you know. What’s an app you wish existed? What’s an app you and your friends, co-workers, or family members would use, need, or like? What’s a problem in your community, network, or circle of friends that could be solved with a digital solution? Maybe, you loan out books to each friends, but don’t have a system to keep track of who borrowed what. Maybe, you want to do a clothing swap with people who are your size so you want to post pictures of the items that you have available for trade and you want to view listed items in your size. Maybe, you have a favorite app that you use all the time but wish it just had this one other feature. Maybe, when you meet your friends in a public place, it’s hard to know if they’re nearby without a lot of texting back and forth, so you want to create an app that shows everyone’s location on one screen. The possibilities are endless! The key to successful brainstorming is to write down all of your ideas no matter how wild they are and talk to people about them to get feedback. Input from others is an essential part of the research needed to ensure that your app idea becomes a successful app that people will want, use, and/or buy. On a recent business trip, we had an idea for a travel app because we always seemed to forget at least one essential item. Over breakfast at the hotel, we discussed the app idea with a couple of colleagues and received amazing insight that we hadn’t thought of like a reminder notification to fill any prescriptions well before the trip and a weather component so we could be sure to pack appropriate clothes for each destination. The more people you talk to, the more market research you will conduct and the more defined the overall app concept will be. Summary This article highlights the many other learning outcomes from engaging in the mobile app development process. Taking an app concept and building it out into an actual mobile app is both a concrete and a creative process. Attention to detail and iteration is vital for both code and design to work effectively and synergistically. Whether you’re creating a game to play with your friend, an app to promote philanthropy involvement on campus, or an app to kickstart a recycling program in your neighborhood, the design thinking process is as much a part of app development as coding. Skills such as brainstorming, research, interviewing, synthesizing, ideating, storyboarding, designing, troubleshooting, problem solving, and testing are not only integral to app building, they are also transferrable to other disciplines, helping to unlock creativity and flow in any endeavor. Resources for Article: Further resources on this subject: Google Apps: Surfing the Web [article] Introduction to IT Inventory and Resource Management [article] How to Expand your Knowledge [article]

0
0
2393

Packt

12 Jan 2016

13 min read

Using JavaScript with HTML

Packt

12 Jan 2016

13 min read

In this article by Syed Omar Faruk Towaha, author of the book JavaScript Projects for Kids, we will discuss about HTML, HTML canvas, implementing JavaScript codes on our HTML pages, and few JavaScript operations. (For more resources related to this topic, see here.) HTML HTML is a markup language. What does it mean? Well, a markup language processes and presents texts using specific codes for formatting, styling, and layout design. There are lots of markup languages; for example, Business Narrative Markup Language (BNML), ColdFusion Markup Language (CFML), Opera Binary Markup Language (OBML), Systems Biology Markup Language (SBML), Virtual Human Markup Language (VHML), and so on. However, in modern web, we use HTML. HTML is based on Standard Generalized Markup Language (SGML). SGML was basically used to design document papers. There are a number of versions of HTML. HTML 5 is the latest version. Throughout this book, we will use the latest version of HTML. Before you start learning HTML, think about your favorite website. What does the website contain? A few web pages? You may see some texts, few images, one or two text field, buttons, and some more elements on each of the webpages. Each of these elements are formatted by HTML. Let me introduce you to a web page. On your Internet browser, go to https://www.google.com. You will see a page similar to the following image: The first thing that you will see on the top of your browser is the title of the webpage: Here, the marked box, 1, is the title of the web page that we loaded. The second box, 2, indicates some links or some texts. The word Google in the middle of the page is an image. The third box, 3, indicates two buttons. Can you tell me what Sign in on the right-hand top of the page is? Yes, it is a button. Let's demonstrate the basic structure of HTML. The term tag will be used frequently to demonstrate the structure. An HTML tag is nothing but a few predefined words between the less than sign (<) and the greater than sign (>). Therefore, the structure of a tag is <WORD>, where WORD is the predefined text that is recognized by the Internet browsers. This type of tag is called open tag. There is another type of tag that is known as a close tag. The structure of a close tag is as </WORD>. You just have to put a forward slash after the less than sign. After this section, you will be able to make your own web page with a few texts using HTML. The structure of an HTML page is similar to the following image: This image has eight tags. Let me introduce all these tags with their activities: 1:This is the <html> tag, which is an open tag and it closes at line 15 with the </html> tag. These tags tell your Internet browser that all the texts and scripts in these two tags are HTML documents. 2:This is the <head> tag, which is an open tag and closes at line 7 with the </head> tag. These tags contain the title, script, style, and metadata of a web page. 3:This is the <title> tag, which closes at line 6 with the </title> tag. This tag contains the title of a webpage. The previous image had the title Google. To see this on the web browser you need to type like that: <title> Google </title> 4:This is the close tag of <title> tag 5:This is the closing tag of <head> tag 6:This is the <body> tag, closes at line 13 with tag </body> Whatever you can see on a web page is written between these two tags. Every element, image, link, and so on are formatted here. To see This is a web page on your browser, you need to type similar to the following: <body> This is a web page </body> 7:The </body> tag closes here. 8:The </html> tag is closed here. Your first webpage You have just learned the eight basic tags of an HTML page. You can now make your own web page. How? Why not you try with me? Open your text editor. Press Ctrl + N, which will open a new untitled file as shown in the following image: Type the following HTML codes on the blank page: <html> <head> <title> My Webpage! </title> </head> <body> This is my webpage :) </body> </html> Then, press Ctrl + Shift + S that will tell you to save your code somewhere on your computer, as follows: Type a suitable name on the File Name: field. I would like to name my HTML file webpage, therefore, I typed webpage.html. You may be thinking why I added an .html extension. As this is an HTML document, you need to add .html or .htm after the name that you give to your webpage. Press the Save button.This will create an HTML document on your computer. Go to the directory, where you saved your HTML file. Remember that you can give your web page any name. However, this name will not be visible on your browser. It is not the title of your webpage. It is a good practice not to keep a blank space on your web page's name. Consider that you want to name your HTML file This is my first webpage.html. Your computer will face no trouble showing the result on the Internet browsers; however, when your website will be on a server, this name may face a problem. Therefore, I suggest you to keep an underscore (_) where you need to add a space similar to the following: This_is_my_first_webpage.html. You will find a file similar to the following image: Now, double-click on the file. You will see your first web page on your Internet browser! You typed My Webpage! between the <title> and </title> tags, which is why your browser is showing this in the first selection box, 1. Also, you typed This is my webpage :) between the <body> and </body> tags. Therefore, you can see the text on your browser in the second selection box, 2. Congratulations! You created your first web page! You can edit your codes and other texts of the webpage.html file by right-clicking on the file and selecting Open with Atom. You must save (Ctrl + S) your codes and text before reopening the file with your browser. Implementing Canvas To add canvas on your HTML page, you need to define the height and width of your canvas in the <canvas> and </canvas> tags as shown in the following: <html> <head> <title>Canvas</title> </head> <body> <canvas id="canvasTest" width="200" height="100" style="border:2px solid #000;"> </canvas> </body> </html> We have defined canvas id as canvasTest, which will be used to play with the canvas. We used the inline CSS on our canvas. A 2 pixel solid border is used to have a better view of the canvas. Adding JavaScript Now, we are going to add few lines of JavaScript to our canvas. We need to add our JavaScript just after the <canvas>…</canvas> tags in the <script> and </script> tags. Drawing a rectangle To test our canvas, let's draw a rectangle in the canvas by typing the following codes: <script type="text/javascript"> var canvas = document.getElementById("canvasTest"); //called our canvas by id var canvasElement = canvas.getContext("2d"); // made our canvas 2D canvasElement.fillStyle = "black"; //Filled the canvas black canvasElement.fillRect(10, 10, 50, 50); //created a rectangle </script> In the script, we declared two JavaScript variables. The canvas variable is used to hold the content of our canvas using the canvas ID, which we used on our <canvas>…</canvas> tags. The canvasElement variable is used to hold the context of the canvas. We made fillStyle black so that the rectangle that we want to draw becomes black when filled. We used canvasElement.fillRect(x, y, w, h); for the shape of the rectangle. Where x is the distance of the rectangle from the x axis, y is the distance of the rectangle from the y axis. The w and h parameters are the width and height of the rectangle, respectively. The full code is similar to the following: <html> <head> <title>Canvas</title> </head> <body> <canvas id="canvasTest" width="200" height="100" style="border:2px solid #000;"> </canvas> <script type="text/javascript"> var canvas = document.getElementById("canvasTest"); //called our canvas by id var canvasElement = canvas.getContext("2d"); // made our canvas 2D canvasElement.fillStyle = "black"; //Filled the canvas black canvasElement.fillRect(10, 10, 50, 50); //created a rectangle </script> </body> </html> The output of the code is as follows: Drawing a line To draw a line in the canvas, you need to insert the following code in your <script> and </script> tags: <script type="text/javascript"> var c = document.getElementById("canvasTest"); var canvasElement = c.getContext("2d"); canvasElement.moveTo(0,0); canvasElement.lineTo(100,100); canvasElement.stroke(); </script> Here, canvasElement.moveTo(0,0); is used to make our line start from the (0,0) co-ordinate of our canvas. The canvasElement.lineTo(100,100); statement is used to make the line diagonal. The canvasElement.stroke(); statement is used to make the line visible. Here is the output of the code: A quick exercise Draw a line using canvas and JavaScript that will be parallel to the y axis of the canvas Draw a rectangle having 300 px height and 200 px width and draw a line on the same canvas touching the rectangle. Assignment operators An assignment operator assigns a value to an operator. I believe you that already know about assignment operators, don't you? Well, you used an equal sign (=) between a variable and its value. By doing this, you assigned the value to the variable. Let's see the following example: Var name = "Sherlock Holmes" The Sherlock Holmes string is assigned to the name variable. You already learned about the increment and decrement operators. Can you tell me what will be the output of the following codes: var x = 3; x *= 2; document.write(x); The output will be 6. Do you remember why this happened? The x *= 2; statement is similar to x = x * 2; as x is equal to 3 and later multiplied by 2. The final number (3 x 2 = 6) is assigned to the same x variable. That's why we got the output as shown in the following screenshot: Let's perform following exercise: What is the output of the following codes: var w = 32; var x = 12; var y = 9; var z = 5; w++; w--; x*2; y = x; y--; z%2; document.write("w = "+w+", x = "+x+", y = "+y+", z = "+z); The output that we get is w = 32, x = 12, y = 11, z = 5. JavaScript comparison and logical operators If you want to do something logical and compare two numbers or variables in JavaScript, you need to use a few logical operators. The following are a few examples of the comparison operators: Operator Description == Equal to != Not equal to > Greater than < Less than => Equal to or greater than <= Less than or equal to The example of each operator is shown in the following screenshot: According to mozilla.org, "Object-oriented programming (OOP) is a programming paradigm that uses abstraction to create models based on the real world. OOP uses several techniques from previously established paradigms, including modularity, polymorphism, and encapsulation." Nicholas C. Zakas states that "OOP languages typically are identified through their use of classes to create multiple objects that have the same properties and methods." You probably have assumed that JavaScript is an object-oriented programming language. Yes, you are absolutely right. Let's see why it is an OOP language. We call a computer programming language object oriented, if it has the following few features: Inheritance Polymorphism Encapsulation Abstraction Before going any further, let's discuss objects. We create objects in JavaScript in the following manner. var person = new Object(); person.name = "Harry Potter"; person.age = 22; person.job = "Magician"; We created an object for person. We added a few properties of person. If we want to access any property of the object, we need to call the property. Consider that you want to have a pop up of the name property of the preceding person object. You can do this with the following method: person.callName = function(){ alert(this.name); }; We can write the preceding code as the following: var person = { name: "Harry Potter", age: 22, job: "Magician", callName: function(){ alert(this.name); } }; Inheritance in JavaScript Inherit means derive something (characteristics, quality, and so on) from one's parents or ancestors. In programming languages, when a class or an object is based on another class or object in order to maintain the same behavior of mother class or object is known as inheritance. We can also say that this is a concept of gaining properties or behaviors of something else. Suppose, X inherits something from Y, it is like X is a type of Y. JavaScript occupies the inheritance capability. Let's see an example. A bird inherits from animal as a bird is a type of animal. Therefore, a bird can do the same thing as an animal. This kind of relationship in JavaScript is a little complex and needs a syntax. We need to use a special object called prototype, which assigns the properties to a type. We need to remember that only function has prototypes. Our Animal function should look similar to the following: function Animal(){ //We can code here. }; To add few properties to the function, we need to add a prototype, as shown in the following: Animal.prototype.eat = function(){ alert("Animal can eat."); }; Summary In this article, you have learned how to write HTML code and implement JavaScript code with the HTML file and HTML canvas. You have learned a few arithmetic operations with JavaScript. The sections in this article are from different chapters of the book, therefore, the flow may look like not in line. I hope you read the original book and practice the code that we discussed there. Resources for Article: Further resources on this subject: Walking You Through Classes [article] JavaScript Execution with Selenium [article] Object-Oriented JavaScript with Backbone Classes [article]

0
0
2321

Packt

10 Jan 2016

9 min read

Cython Won't Bite

Packt

10 Jan 2016

9 min read

In this article by Philip Herron, the author of the book Learning Cython Programming - Second Edition, we see how Cython is much more than just a programminglanguage. Its origin can be traced to Sage, the mathematics software package, where it was used to increase the performance of mathematical computations, such as those involving matrices. More generally, I tend to consider Cython as an alternative to Swig to generate really good python bindings to native code. Language bindings have been around for years and Swig was one of the first and best tools to generate bindings for multitudes of languages. Cython generates bindings for Python code only, and this single purpose approach means it generates the best Python bindings you can get outside of doing it all manually; attempt the latter only if you're a Python core developer. For me, taking control of legacy software by generating language bindings is a great way to reuse any software package. Consider a legacy application written in C/C++; adding advanced modern features like a web server for a dashboard or message bus is not a trivial thing to do. More importantly, Python comes with thousands of packages that have been developed, tested, and used by people for a long time, and can do exactly that. Wouldn't it be great to take advantage of all of this code? With Cython, we can do exactly this, and I will demonstrate approaches with plenty of example codes along the way. This article will be dedicated to the core concepts on using Cython, including compilation, and will provide a solid reference and introduction for all to Cython core concepts. In this article, we will cover: Installing Cython Getting started - Hello World Using distutils with Cython Calling C functions from Python Type conversion (For more resources related to this topic, see here.) Installing Cython Since Cython is a programming language, we must install its respective compiler, which just so happens to be so aptly named Cython. There are many different ways to install Cython. The preferred one would be to use pip: $ pip install Cython This should work on both Linux and Mac. Alternatively, you can use your Linux distribution's package manager to install Cython: $ yum install cython # will work on Fedora and Centos $ apt-get install cython # will work on Debian based systems In Windows, although there are a plethora of options available, following this Wiki is the safest option to stay up to date: http://wiki.cython.org/InstallingOnWindows Emacs mode There is an emacs mode available for Cython. Although the syntax is nearly the same as Python, there are differences that conflict in simply using Python mode. You can choose to grab the cython-mode.el from the Cython source code (inside the Tools directory.) The preferred way of installing packages to emacs would be to use a package repository such as MELPA(). To add the package repository to emacs, open your ~/.emacs configuration file and add the following code: (when (>= emacs-major-version 24) (require 'package) (add-to-list 'package-archives '("melpa" . "http://melpa.org/packages/") t) (package-initialize)) Once you add this and reload your configuration to install the cython mode, you can simply run the following: 'M-x package-install RET cython-mode' Once this is installed, you can activate the mode by adding this into your emacs config file: (require 'cython-mode) You can always activate the mode manually at any time with the following: 'M-x cython-mode RET' Getting the code examples Throughout this book, I intend to show real examples that are easy to digest to help you get a feel of the different things you can achieve with Cython. To access and download the code used, please clone the following repository: $ git clone git://github.com/redbrain/cython-book.git Getting started – Hello World As you will see when running the Hello World program, Cython generates native python modules. Therefore, while running any Cython code, you will reference it via a module import in Python. Let's build the module: $ cd cython-book/chapter1/helloworld $ make You should have now created helloworld.so! This is a Cython module of the same name of the Cython source code file. While in the same directory of the shared object module, you can invoke this code by running a respective Python import: $ python Python 2.7.3 (default, Aug 1 2012, 05:16:07) [GCC 4.6.3] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import helloworld Hello World from cython! As you can see from opening helloworld.pyx, it looks just like a normal Python Hello World application; but as previously stated, Cython generates modules. These modules need a name so that it can be correctly imported by the python runtime. The Cython compiler simply uses the name of the source code file. It then requires us to compile this to the same shared object name. Overall, Cython source code files have the .pyx,.pxd, and .pxi extensions. For now, all we care about are the .pyx files; the others are for cimports and includes respectively within a .pyx module file. The following screenshot depicts the compilation flow required to have a callable native python module: I wrote a basic makefile so that you can simply run make to compile these examples. Here's the code to do this manually: $ cython helloworld.pyx $ gcc/clang -g -O2 -fpic `python-config --cflags` -c helloworld.c -o helloworld.o $ gcc/clang -shared -o helloworld.so helloworld.o `python-config –libs Using DistUtils with Cython You can compile this using Python distutils and cythonize. Open setup.py: from distutils.core import setup from Cython.Build import cythonize setup( ext_modules = cythonize("helloworld.pyx") ) Using the cythonize function as part of the ext_modules section will build any specified Cython source into an installable Python module. This will compile helloworld.pyx into the same shared library. This provides the Python practice to distribute native modules as part of distutils. Calling C functions from Python We should be careful when talking about Python and Cython for clarity, since the syntax is so similar. Let's wrap a simple AddFunction in C and make it callable from Python. Firstly, open a file called AddFunction.c, and write a simple function into it: #include <stdio.h> int AddFunction(int a, int b) { printf("look we are within your c code!n"); return a + b; } This is the C code we will call, which is just a simple function to add two integers. Now, let's get Python to call it. Open a file called AddFunction.h, wherein we will declare our prototype: #ifndef __ADDFUNCTION_H__ #define __ADDFUNCTION_H__ extern int AddFunction (int, int); #endif //__ADDFUNCTION_H__ We need this so that Cython can see the prototype for the function we want to call. In practice, you will already have your headers in your own project with your prototypes and declarations already available. Open a file called AddFunction.pyx, and insert the following code in to it: cdef extern from "AddFunction.h": cdef int AddFunction(int, int) Here, we have to declare what code we want to call. The cdef is a keyword signifying that this is from the C code that will be linked in. Now, we need a Python entry point: def Add(a, b): return AddFunction(a, b) This Add is a Python callable inside a PyAddFunction module. Again, I have provided a handy makefile to produce the module: $ cd cython-book/chapter1/ownmodule $ make cython -2 PyAddFunction.pyx gcc -g -O2 -fpic -c PyAddFunction.c -o PyAddFunction.o `python-config --includes` gcc -g -O2 -fpic -c AddFunction.c -o AddFunction.o gcc -g -O2 -shared -o PyAddFunction.so AddFunction.o PyAddFunc-tion.o `python-config --libs` Notice that AddFunction.c is compiled into the same PyAddFunction.so shared object. Now, let's call this AddFunction and check to see if C can add numbers correctly: $ python >>> from PyAddFunction import Add >>> Add(1,2) look we are within your c code!! 3 Notice the print statement inside AddFunction.c::AddFunction and that the final result is printed correctly. Therefore, we know the control hit the C code and did the calculation in C and not inside the Python runtime. This is a revelation to what is possible. Python can be cited to be slow in some circumstances. Using this technique, it makes it possible for Python code to bypass its own runtime and to run in an unsafe context, which is unrestricted by the Python runtime, which is much faster. Type conversion Notice that we had to declare a prototype inside the cython source code PyAddFunction.pyx: cdef extern from "AddFunction.h": cdef int AddFunction(int, int) It let the compiler know that there is a function called AddFunction, and that it takes two int's and returns an int. This is all the information the compiler needs to know besides the host and target operating system's calling convention in order to call this function safely. Then, we created the Python entry point, which is a python callable that takes two parameters: def Add(a, b): return AddFunction(a, b) Inside this entry point, it simply returned the native AddFunction and passed the two Python objects as parameters. This is what makes Cython so powerful. Here, the Cython compiler must inspect the function call and generate code to safely try and convert these Python objects to native C integers. This becomes difficult when precision is taken into account, and potential overflow, which just so happens to be a major use case since it handles everything so well. Also, remember that this function returns an integer and Cython also generates code to convert the integer return into a valid Python object. Summary Overall, we installed the Cython compiler, ran the Hello World example, and took into consideration that we need to compile all code into native shared objects. We also saw how to wrap native C code to be callable from Python, and how to do type conversion of parameters and return to C code and back to Python. Resources for Article: Further resources on this subject: Monte Carlo Simulation and Options [article] Understanding Cython [article] Scaling your Application Across Nodes with Spring Python's Remoting [article]

0
0
1261

article-image-parallelization-using-reducers

Packt

06 Jan 2016

18 min read

Parallelization using Reducers

Packt

06 Jan 2016

18 min read

In this article by Akhil Wali, the author of the book Mastering Clojure, we will study this particular abstraction of collections and how it is quite orthogonal to viewing collections as sequences. Sequences and laziness are great way of handling collections. The Clojure standard library provides several functions to handle and manipulate sequences. However, abstracting a collection as a sequence has an unfortunate consequence—any computation that is performed over all the elements of a sequence is inherently sequential. All standard sequence functions create a new collection that be similar to the collection they are passed. Interestingly, performing a computation over a collection without creating a similar collection—even as an intermediary result—is quite useful. For example, it is often required to reduce a given collection to a single value through a series of transformations in an iterative manner. This sort of computation does not necessarily require the intermediary results of each transformation to be saved. (For more resources related to this topic, see here.) A consequence of iteratively computing values from a collection is that we cannot parallelize it in a straightforward way. Modern map-reduce frameworks handle this kind of computation by pipelining the elements of a collection through several transformations in parallel and finally reducing the results into a single result. Of course, the result could be a new collection as well. A drawback is that this methodology produces concrete collections as intermediate results of each transformation, which is rather wasteful. For example, if we want to filter values from a collection, a map-reduce strategy would require creating empty collections to represent values that are left out of the reduction step to produce the final result. This incurs unnecessary memory allocation and also creates additional work for the reduction step that produces the final result. Hence, there’s scope for optimizing these kinds of computations. This brings us to the notion of treating computations over collections as reducers to attain better performance. Of course, this doesn't mean that reducers are a replacement for sequences. Sequences and laziness are great for abstracting computations that create and manipulate collections, while reducers are specialized high-performance abstractions of collections in which a collection needs to be piped through several transformations and combined to produce the final result. Reducers achieve a performance gain in the following ways: Reducing the amount of memory allocated to produce the desired result Parallelizing the process of reducing a collection into a single result, which could be an entirely new collection The clojure.core.reducers namespace provides several functions for processing collections using reducers. Let's now examine how reducers are implemented and also study a few examples that demonstrate how reducers can be used. Using reduce to transform collections Sequences and functions that operate on sequences preserve the sequential ordering between the constituent elements of a collection. Lazy sequences avoid unnecessary realization of elements in a collection until they are required for a computation, but the realization of these values is still performed in a sequential manner. However, this characteristic of sequential ordering may not be desirable for all computations performed over it. For example, it's not possible to map a function over a vector and then lazily realize values in the resulting collection by random access, since the map function converts the collection that it is supplied into a sequence. Also, functions such as map and filter are lazy but still sequential by nature. Consider a unary function as shown in Example 3.1 that we intend to map it over a given vector. The function must compute a value from the one it is supplied, and also perform a side effect so that we can observe its application over the elements in a collection: Example 3.1. A simple unary function (defn square-with-side-effect [x] (do (println (str "Side-effect: " x)) (* x x))) The square-with-side-effect function defined here simply returns the square of a number x using the * function. This function also prints the value of x using a println form whenever it is called. Suppose this function is mapped over a given vector. The resulting collection will have to be realized completely if a computation has to be performed over it, even if all the elements from the resulting vector are not required. This can be demonstrated as follows: user> (def mapped (map square-with-side-effect [0 1 2 3 4 5])) #'user/mapped user> (reduce + (take 3 mapped)) Side-effect: 0 Side-effect: 1 Side-effect: 2 Side-effect: 3 Side-effect: 4 Side-effect: 5 5 As previously shown, the mapped variable contains the result of mapping the square-with-side-effect function over a vector. If we try to sum the first three values in the resulting collection using the reduce, take, and + functions, all the values in the [0 1 2 3 4 5] vector are printed as a side effect. This means that the square-with-side-effect function was applied to all the elements in the initial vector, despite the fact that only the first three elements were actually required by the reduce form. Of course, this can be solved by using the seq function to convert the vector to a sequence before we map the square-with-side-effect function over it. But then, we lose the ability to efficiently access elements in a random order in the resulting collection. To dive deeper into why this actually happens, you first need to understand how the standard map function is actually implemented. A simplified definition of the map function is shown here: Example 3.2. A simplified definition of the map function (defn map [f coll] (cons (f (first coll)) (lazy-seq (map f (rest coll))))) The definition of map in Example 3.2 is a simplified and rather incomplete one, as it doesn't check for an empty collection and cannot be used over multiple collections. That aside, this definition of map does indeed apply a function f to all the elements in a coll collection. This is implemented using a composition of the cons, first, rest, and lazy-seq forms. The implementation can be interpreted as, "apply the f function to the first element in the coll collection, and then map f over the rest of the collection in a lazy manner." An interesting consequence of this implementation is that the map function has the following characteristics: The ordering among the elements in the coll collection is preserved. This computation is performed recursively. The lazy-seq form is used to perform the computation in a lazy manner. The use of the first and rest forms indicates that coll must be a sequence, and the cons form will also produce a result that is a sequence. Hence, the map function accepts a sequence and builds a new one. Another interesting characteristic about lazy sequences is that they are realized in chunks. This means that a lazy sequence is realized in chunks of 32 elements, each as an optimization, when the values in the sequence are actually required. Sequences that behave this way are termed as chunked sequences. Of course, not all sequences are chunked, and we can check whether a given sequence is chunked using the chunked-seq? predicate. The range function returns a chunked sequence, shown as follows: user> (first (map #(do (print !) %) (range 70))) !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! 0 user> (nth (map #(do (print !) %) (range 70)) 32) !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! 32 Both the statements in the output shown previously select a single element from a sequence returned by the map function. The function passed to the map function in both the above statements prints the ! character and returns the value supplied to it. In the first statement, the first 32 elements of the resulting sequence are realized, even though only the first element is required. Similarly, the second statement is observed to realize the first 64 elements of the resulting sequence when the element at the 32nd position is obtained using the nth function. Chunked sequences have been an integral part of Clojure since version 1.1. However, none of the properties of the sequences described above are needed to transform a given collection into a result that is not a sequence. If we were to handle such computations efficiently, we cannot build on functions that return sequences, such as map and filter. Incidentally, the reduce function does not necessarily produce a sequence. It also has a couple of other interesting properties: The reduce function actually lets the collection it is passed to define how it is computed over or reduced. Thus, reduce is collection independent. Also, the reduce function is versatile enough to build a single value or an entirely new collection as well. For example, using reduce with the * or + function will create a single-valued result, while using it with the cons or concat function can create a new collection as the result. Thus, reduce can build anything. A collection is said to be reducible if it defines how it can be reduced to a single result. The binary function that is used by the reduce function along with a collection is also termed as a reducing function. A reducing function requires two arguments: one to represent the result of the computation so far, and another to represent an input value that has to be combined into the result. Several reducing functions can be composed into one, which effectively changes how the reduce function processes a given collection. This composition is done using reducers, which can be thought of as a shortened version of the term reducing function transformers. The use of sequences and laziness can be compared to the use of reducers to perform a given computation by Rich Hickey's infamous pie-maker analogy. Suppose a pie-maker has been supplied a bag of apples with an intent to reduce the apples to a pie. There are a couple of transformations needed to perform this task. First, the stickers on all the apples have to be removed; as in, we map a function to "take the sticker off" the apples in the collection. Also, all the rotten apples will have to be removed, which is analogous to using the filter function to remove elements from a collection. Instead of performing this work herself, the pie-maker delegates it to her assistant. The assistant could first take the stickers off all the apples, thus producing a new collection, and then take out the rotten apples to produce another new collection, which illustrates the use of lazy sequences. But then, the assistant would be doing unnecessary work by removing the stickers off of the rotten apples, which will have to be discarded later anyway. On the other hand, the assistant could delay this work until the actual reduction of the processed apples into a pie is performed. Once the work actually needs to be performed, the assistant will compose the two tasks of mapping and filtering the collection of apples, thus avoiding any unnecessary work. This case depicts the use of reducers for composing and transforming the tasks needed to effectively reduce a collection of apples to a pie. By using reducers, we create a recipe of tasks to reduce a collection of apples to a pie and delay all processing until the final reduction, instead of dealing with collections of apples as intermediary results of each task: The following namespaces must be included in your namespace declaration for the upcoming examples. (ns my-namespace (:require [clojure.core.reducers :as r])) The clojure.core.reducers namespace requires Java 6 with the jsr166y.jar or Java 7+ for fork/join support. Let's now briefly explore how reducers are actually implemented. Functions that operate on sequences use the clojure.lang.ISeq interface to abstract the behavior of a collection. In the case of reducers, the common interface that we must build upon is that of a reducing function. As we mentioned earlier, a reducing function is a two-arity function in which the first argument is the result produced so far and the second argument is the current input, which has to be combined with the first argument. The process of performing a computation over a collection and producing a result can be generalized into three distinct cases. They can be described as follows: A new collection with the same number of elements as that of the collection it is supplied needs to be produced. This one-to-one case is analogous to using the map function. The computation shrinks the supplied collection by removing elements from it. This can be done using the filter function. The computation could also be expansive, in which case it produces a new collection that contains an increased number of elements. This is like what the mapcat function does. These cases depict the different ways by which a collection can be transformed into the desired result. Any computation, or reduction, over a collection can be thought of as an arbitrary sequence of such transformations. These transformations are represented by transformers, which are functions that transform a reducing function. They can be implemented as shown here in Example 3.3: Example 3.3. Transformers (defn mapping [f] (fn [rf] (fn [result input] (rf result (f input))))) (defn filtering [p?] (fn [rf] (fn [result input] (if (p? input) (rf result input) result)))) (defn mapcatting [f] (fn [rf] (fn [result input] (reduce rf result (f input))))) The mapping, filtering, and mapcatting functions in the above example Example 3.3 represent the core logic of the map, filter, and mapcat functions respectively. All of these functions are transformers that take a single argument and return a new function. The returned function transforms a supplied reducing function, represented by rf, and returns a new reducing function, created using this expression: (fn [result input] ... ). Functions returned by the mapping, filtering, and mapcatting functions are termed as reducing function transformers. The mapping function applies the f function to the current input, represented by the input variable. The value returned by the f function is then combined with the accumulated result, represented by result, using the reducing function rf. This transformer is a frighteningly pure abstraction of the standard map function that applies an f function over a collection. The mapping function makes no assumptions about the structure of the collection it is supplied, or how the values returned by the f function are combined to produce the final result. Similarly, the filtering function uses a predicate, p?, to check whether the current input of the rf reducing function must combined into the final result, represented by result. If the predicate is not true, then the reducing function will simply return the result value without any modification. The mapcatting function uses the reduce function to combine the value result with the result of the (f input) expression. In this transformer, we can assume that the f function will return a new collection and the rf reducing function will somehow combine two collections into a new one. One of the foundations of the reducers library is the CollReduce protocol defined in the clojure.core.protocols namespace. This protocol defines the behavior of a collection when it is passed as an argument to the reduce function, and it is declared as shown below Example 3.4: Example 3.4. The CollReduce protocol (defprotocol CollReduce (coll-reduce [coll rf init])) The clojure.core.reducers namespace defines a reducer function that creates a reducible collection by dynamically extending the CollReduce protocol, as shown in this code Example 3.5: The reducer function (defn reducer ([coll xf] (reify CollReduce (coll-reduce [_ rf init] (coll-reduce coll (xf rf) init))))) The reducer function combines a collection (coll) and a reducing function transformer (xf), which is returned by the mapping, filtering, and mapcatting functions, to produce a new reducible collection. When reduce is invoked on a reducible collection, it will ultimately ask the collection to reduce itself using the reducing function returned by the (xf rf) expression. Using this mechanism, several reducing functions can be composed into a computation that has to be performed over a given collection. Also, the reducer function needs to be defined only once, and the actual implementation of coll-reduce is provided by the collection supplied to the reducer function. Now, we can redefine the reduce function to simply invoke the coll-reduce function implemented by a given collection, as shown herein Example 3.6: Example 3.6. Redefining the reduce function (defn reduce ([rf coll] (reduce rf (rf) coll)) ([rf init coll] (coll-reduce coll rf init))) As shown in the above code Example 3.6, the reduce function delegates the job of reducing a collection to the collection itself using the coll-reduce function. Also, the reduce function will use the rf reducing function to supply the init argument when it is not specified. An interesting consequence of this definition of reduce is that the rf function must produce an identity value when it is supplied no arguments. The standard reduce function even uses the CollReduce protocol to delegate the job of reducing a collection to the collection itself, but it will also fall back on the default definition of reduce if the supplied collection does not implement the CollReduce protocol. Since Clojure 1.4, the reduce function allows a collection to define how it is reduced using the clojure.core.CollReduce protocol. Clojure 1.5 introduced the clojure.core.reducers namespace, which extends the use of this protocol. All the standard Clojure collections, namely lists, vectors, sets, and maps, implement the CollReduce protocol. The reducer function can be used to build a sequence of transformations to be applied on a collection when it is passed as an argument to the reduce function. This can be demonstrated as follows: user> (r/reduce + 0 (r/reducer [1 2 3 4] (mapping inc))) 14 user> (reduce + 0 (r/reducer [1 2 3 4] (mapping inc))) 14 In this output, the mapping function is used with the inc function to create a reducing function transformer that increments all the elements in a given collection. This transformer is then combined with a vector using the reducer function to produce a reducible collection. The call to reduce in both of the above statements is transformed into the (reduce + [2 3 4 5]) expression, thus producing the result as 14. We can now redefine the map, filter, and mapcat functions using the reducer function, as shown belowin Example 3.7: Redefining the map, filter and mapcat functions using the reducer form (defn map [f coll] (reducer coll (mapping f))) (defn filter [p? coll] (reducer coll (filtering p?))) (defn mapcat [f coll] (reducer coll (mapcatting f))) As shown in Example 3.7, the map, filter, and mapcat functions are now simply compositions of the reducer form with the mapping, filtering, and mapcatting transformers respectively. The definitions of CollReduce, reducer, reduce, map, filter, and mapcat are simplified versions of their actual definitions in the clojure.core.reducers namespace. The definitions of the map, filter, and mapcat functions shown in Example 3.7 have the same shape as the standard versions of these functions, shown as follows: user> (r/reduce + (r/map inc [1 2 3 4])) 14 user> (r/reduce + (r/filter even? [1 2 3 4])) 6 user> (r/reduce + (r/mapcat range [1 2 3 4])) 10 Hence, the map, filter, and mapcat functions from the clojure.core.reducers namespace can be used in the same way as the standard versions of these functions. The reducers library also provides a take function that can be used as a replacement for the standard take function. We can use this function to reduce the number of calls to the square-with-side-effect function (from Example 3.1) when it is mapped over a given vector, as shown below: user> (def mapped (r/map square-with-side-effect [0 1 2 3 4 5])) #'user/mapped user> (reduce + (r/take 3 mapped)) Side-effect: 0 Side-effect: 1 Side-effect: 2 Side-effect: 3 5 Thus, using the map and take functions from the clojure.core.reducers namespace as shown above avoids the application of the square-with-side-effect function over all five elements in the [0 1 2 3 4 5] vector, as only the first three are required. The reducers library also provides variants of the standard take-while, drop, flatten, and remove functions which are based on reducers. Effectively, functions based on reducers will require a lesser number of allocations than sequence-based functions, thus leading to an improvement in performance. For example, consider the process and process-with-reducer functions, shown here: Example 3.8. Functions to process a collection of numbers using sequences and reducers (defn process [nums] (reduce + (map inc (map inc (map inc nums))))) (defn process-with-reducer [nums] (reduce + (r/map inc (r/map inc (r/map inc nums))))) This process function in Example 3.8 applies the inc function over a collection of numbers represented by nums using the map function. The process-with-reducer function performs the same action but uses the reducer variant of the map function. The process-with-reducer function will take a lesser amount of time to produce its result from a large vector compared to the process function, as shown here: user> (def nums (vec (range 1000000))) #'user/nums user> (time (process nums)) "Elapsed time: 471.217086 msecs" 500002500000 user> (time (process-with-reducer nums)) "Elapsed time: 356.767024 msecs" 500002500000 The process-with-reducer function gets a slight performance boost as it requires a lesser number of memory allocations than the process function. The performance of this computation can be improved by a greater scale if we can somehow parallelize it. Summary In this article, we explored the clojure.core.reducers library in detail. We took a look at how reducers are implemented and also how we can use reducers to handle large collections of data in an efficient manner. Resources for Article: Further resources on this subject: Getting Acquainted with Storm [article] Working with Incanter Datasets [article] Application Performance [article]

0
0
921

Packt

03 Nov 2015

6 min read

HTML5 APIs

Packt

03 Nov 2015

6 min read

In this article by Dmitry Sheiko author of the book JavaScript Unlocked we will create our first web component. (For more resources related to this topic, see here.) Creating the first web component You might be familiar with HTML5 video element (http://www.w3.org/TR/html5/embedded-content-0.html#the-video-element). By placing a single element in your HTML, you will get a widget that runs a video. This element accepts a number of attributes to set up the player. If you want to enhance this, you can use its public API and subscribe listeners on its events (http://www.w3.org/2010/05/video/mediaevents.html). So, we reuse this element whenever we need a player and only customize it for project-relevant look and feel. If only we had enough of these elements to pick every time we needed a widget on a page. However, this is not the right way to include any widget that we may need in an HTML specification. However, the API to create custom elements, such as video, is already there. We can really define an element, package the compounds (JavaScript, HTML, CSS, images, and so on), and then just link it from the consuming HTML. In other words, we can create an independent and reusable web component, which we then use by placing the corresponding custom element (<my-widget />) in our HTML. We can restyle the element, and if needed, we can utilize the element API and events. For example, if you need a date picker, you can take an existing web component, let's say the one available at http://component.kitchen/components/x-tag/datepicker. All that we have to do is download the component sources (for example, using browser package manager) and link to the component from our HTML code: <link rel="import" href="bower_components/x-tag-datepicker/src/datepicker.js"> Declare the component in the HTML code: <x-datepicker name="2012-02-02"></x-datepicker> This is supposed to go smoothly in the latest versions of Chrome, but this won't probably work in other browsers. Running a web component requires a number of new technologies to be unlocked in a client browser, such as Custom Elements, HTML Imports, Shadow DOM, and templates. The templates include the JavaScript templates. The Custom Element API allows us to define new HTML elements, their behavior, and properties. The Shadow DOM encapsulates a DOM subtree required by a custom element. And support of HTML Imports assumes that by a given link the user-agent enables a web-component by including its HTML on a page. We can use a polyfill (http://webcomponents.org/) to ensure support for all of the required technologies in all the major browsers: <script src="./bower_components/webcomponentsjs/webcomponents.min.js"></script> Do you fancy writing your own web components? Let's do it. Our component acts similar to HTML's details/summary. When one clicks on summary, the details show up. So we create x-details.html, where we put component styles and JavaScript with component API: x-details.html <style> .x-details-summary { font-weight: bold; cursor: pointer; } .x-details-details { transition: opacity 0.2s ease-in-out, transform 0.2s ease-in-out; transform-origin: top left; } .x-details-hidden { opacity: 0; transform: scaleY(0); } </style> <script> "use strict"; /** * Object constructor representing x-details element * @param {Node} el */ var DetailsView = function( el ){ this.el = el; this.initialize(); }, // Creates an object based in the HTML Element prototype element = Object.create( HTMLElement.prototype ); /** @lend DetailsView.prototype */ Object.assign( DetailsView.prototype, { /** * @constracts DetailsView */ initialize: function(){ this.summary = this.renderSummary(); this.details = this.renderDetails(); this.summary.addEventListener( "click", this.onClick.bind( this ), false ); this.el.textContent = ""; this.el.appendChild( this.summary ); this.el.appendChild( this.details ); }, /** * Render summary element */ renderSummary: function(){ var div = document.createElement( "a" ); div.className = "x-details-summary"; div.textContent = this.el.dataset.summary; return div; }, /** * Render details element */ renderDetails: function(){ var div = document.createElement( "div" ); div.className = "x-details-details x-details-hidden"; div.textContent = this.el.textContent; return div; }, /** * Handle summary on click * @param {Event} e */ onClick: function( e ){ e.preventDefault(); if ( this.details.classList.contains( "x-details-hidden" ) ) { return this.open(); } this.close(); }, /** * Open details */ open: function(){ this.details.classList.toggle( "x-details-hidden", false ); }, /** * Close details */ close: function(){ this.details.classList.toggle( "x-details-hidden", true ); } }); // Fires when an instance of the element is created element.createdCallback = function() { this.detailsView = new DetailsView( this ); }; // Expose method open element.open = function(){ this.detailsView.open(); }; // Expose method close element.close = function(){ this.detailsView.close(); }; // Register the custom element document.registerElement( "x-details", { prototype: element }); </script> Further in JavaScript code, we create an element based on a generic HTML element (Object.create( HTMLElement.prototype )). Here we could inherit from a complex element (for example, video) if needed. We register a x-details custom element using the earlier one created as prototype. With element.createdCallback, we subscribe a handler that will be called when a custom element created. Here we attach our view to the element to enhance it with the functionality that we intend for it. Now we can use the component in HTML, as follows: <!DOCTYPE html> <html> <head> <title>X-DETAILS</title>    <link rel="import" href="./x-details.html"> </head> <body> <x-details data-summary="Click me"> Nunc iaculis ac erat eu porttitor. Curabitur facilisis ligula et urna egestas mollis. Aliquam eget consequat tellus. Sed ullamcorper ante est. In tortor lectus, ultrices vel ipsum eget, ultricies facilisis nisl. Suspendisse porttitor blandit arcu et imperdiet. </x-details> </body> </html> Summary This article covered basically how we can create our own custom advanced elements that can be easily reused, restyled, and enhanced. The assets required to render such elements are HTML, CSS, JavaScript, and images are bundled as Web Components. So, we literally can build the Web now from the components similar to how buildings are made from bricks. Resources for Article: Further resources on this subject: An Introduction to Kibana [article] Working On Your Bot [article] Icons [article]

0
0
1082

Packt

28 Oct 2015

40 min read

Understanding Ranges

Packt

28 Oct 2015

40 min read

In this article by Michael Parker, author of the book Learning D, explains since they were first introduced, ranges have become a pervasive part of D. It's possible to write D code and never need to create any custom ranges or algorithms yourself, but it helps tremendously to understand what they are, where they are used in Phobos, and how to get from a range to an array or another data structure. If you intend to use Phobos, you're going to run into them eventually. Unfortunately, some new D users have a difficult time understanding and using ranges. The aim of this article is to present ranges and functional styles in D from the ground up, so you can see they aren't some arcane secret understood only by a chosen few. Then, you can start writing idiomatic D early on in your journey. In this article, we lay the foundation with the basics of constructing and using ranges in two sections: Ranges defined Ranges in use (For more resources related to this topic, see here.) Ranges defined In this section, we're going to explore what ranges are and see the concrete definitions of the different types of ranges recognized by Phobos. First, we'll dig into an example of the sort of problem ranges are intended to solve and in the process, develop our own solution. This will help form an understanding of ranges from the ground up. The problem As part of an ongoing project, you've been asked to create a utility function, filterArray, which takes an array of any type and produces a new array containing all of the elements from the source array that pass a Boolean condition. The algorithm should be nondestructive, meaning it should not modify the source array at all. For example, given an array of integers as the input, filterArray could be used to produce a new array containing all of the even numbers from the source array. It should be immediately obvious that a function template can handle the requirement to support any type. With a bit of thought and experimentation, a solution can soon be found to enable support for different Boolean expressions, perhaps a string mixin, a delegate, or both. After browsing the Phobos documentation for a bit, you come across a template that looks like it will help with the std.functional.unaryFun implementation. Its declaration is as follows: template unaryFun(alias fun, string parmName = "a"); The alias fun parameter can be a string representing an expression, or any callable type that accepts one argument. If it is the former, the name of the variable inside the expression should be parmName, which is "a" by default. The following snippet demonstrates this: int num = 10; assert(unaryFun!("(a & 1) == 0")(num)); assert(unaryFun!("(x > 0)", "x")(num)); If fun is a callable type, then unaryFun is documented to alias itself to fun and the parmName parameter is ignored. The following snippet calls unaryFun first with struct that implements opCall, then calls it again with a delegate literal: struct IsEven { bool opCall(int x) { return (x & 1) == 0; } } IsEven isEven; assert(unaryFun!isEven(num)); assert(unaryFun!(x => x > 0)(num)); With this, you have everything you need to implement the utility function to spec: import std.functional; T[] filterArray(alias predicate, T)(T[] source) if(is(typeof(unaryFun!predicate(source[0]))) { T[] sink; foreach(t; source) { if(unaryFun!predicate(t)) sink ~= t; } return sink; } unittest { auto ints = [1, 2, 3, 4, 5, 6, 7]; auto even = ints.filterArray!(x => (x & 1) == 0)(); assert(even == [2, 4, 6]); } The unittest verifies that the function works as expected. As a standalone implementation, it gets the job done and is quite likely good enough. But, what if, later on down the road, someone decides to create more functions that perform specific operations on arrays in the same manner? The natural outcome of that is to use the output of one operation as the input for another, creating a chain of function calls to transform the original data. The most obvious problem is that any such function that cannot perform its operation in place must allocate at least once every time it's called. This means, chain operations on a single array will end up allocating memory multiple times. This is not the sort of habit you want to get into in any language, especially in performance-critical code, but in D, you have to take the GC into account. Any given allocation could trigger a garbage collection cycle, so it's a good idea to program to the GC; don't be afraid of allocating, but do so only when necessary and keep it out of your inner loops. In filterArray, the naïve appending can be improved upon, but the allocation can't be eliminated unless a second parameter is added to act as the sink. This allows the allocation strategy to be decided at the call site rather than by the function, but it leads to another problem. If all of the operations in a chain require a sink and the sink for one operation becomes the source for the next, then multiple arrays must be declared to act as sinks. This can quickly become unwieldy. Another potential issue is that filterArray is eager, meaning that every time the function is called, the filtering takes place immediately. If all of the functions in a chain are eager, it becomes quite difficult to get around the need for allocations or multiple sinks. The alternative, lazy functions, do not perform their work at the time they are called, but rather at some future point. Not only does this make it possible to put off the work until the result is actually needed (if at all), it also opens the door to reducing the amount of copying or allocating needed by operations in the chain. Everything could happen in one step at the end. Finally, why should each operation be limited to arrays? Often, we want to execute an algorithm on the elements of a list, a set, or some other container, so why not support any collection of elements? By making each operation generic enough to work with any type of container, it's possible to build a library of reusable algorithms without the need to implement each algorithm for each type of container. The solution Now we're going to implement a more generic version of filterArray, called filter, which can work with any container type. It needs to avoid allocation and should also be lazy. To facilitate this, the function should work with a well-defined interface that abstracts the container away from the algorithm. By doing so, it's possible to implement multiple algorithms that understand the same interface. It also takes the decision on whether or not to allocate completely out of the algorithms. The interface of the abstraction need not be an actual interface type. Template constraints can be used to verify that a given type meets the requirements. You might have heard of duck typing. It originates from the old saying, If it looks like a duck, swims like a duck, and quacks like a duck, then it's probably a duck. The concept is that if a given object instance has the interface of a given type, then it's probably an instance of that type. D's template constraints and compile-time capabilities easily allow for duck typing. The interface In looking for inspiration to define the new interface, it's tempting to turn to other languages like Java and C++. On one hand, we want to iterate the container elements, which brings to mind the iterator implementations in other languages. However, we also want to do a bit more than that, as demonstrated by the following chain of function calls: container.getType.algorithm1.algorithm2.algorithm3.toContainer(); Conceptually, the instance returned by getType will be consumed by algorithm1, meaning that inside the function, it will be iterated to the point where it can produce no more elements. But then, algorithm1 should return an instance of the same type, which can iterate over the same container, and which will in turn be consumed by algorithm2. The process repeats for algorithm3. This implies that instances of the new type should be able to be instantiated independent of the container they represent. Moreover, given that D supports slicing, the role of getType previously could easily be played by opSlice. Iteration need not always begin with the first element of a container and end with the last; any range of elements should be supported. In fact, there's really no reason for an actual container instance to exist at all in some cases. Imagine a random number generator; we should be able to plug one into the preceding function chain; just eliminate the container and replace getType with the generator instance. As long as it conforms to the interface we define, it doesn't matter that there is no concrete container instance backing it. The short version of it is, we don't want to think solely in terms of iteration, as it's only a part of the problem we're trying to solve. We want a type that not only supports iteration, of either an actual container or a conceptual one, but one that also can be instantiated independently of any container, knows both its beginning and ending boundaries, and, in order to allow for lazy algorithms, can be used to generate new instances that know how to iterate over the same elements. Considering these requirements, Iterator isn't a good fit as a name for the new type. Rather than naming it for what it does or how it's used, it seems more appropriate to name it for what it represents. There's more than one possible name that fits, but we'll go with Range (as in, a range of elements). That's it for the requirements and the type name. Now, moving on to the API. For any algorithm that needs to sequentially iterate a range of elements from beginning to end, three basic primitives are required: There must be a way to determine whether or not any elements are available There must be a means to access the next element in the sequence There must be a way to advance the sequence so that another element can be made ready Based on these requirements, there are several ways to approach naming the three primitives, but we'll just take a shortcut and use the same names used in D. The first primitive will be called empty and can be implemented either as a member function that returns bool or as a bool member variable. The second primitive will be called front, which again could be a member function or variable, and which returns T, the element type of the range. The third primitive can only be a member function and will be called popFront, as conceptually it is removing the current front from the sequence to ready the next element. A range for arrays Wrapping an array in the Range interface is quite easy. It looks like this: auto range(T)(T[] array) { struct ArrayRange(T) { private T[] _array; bool empty() @property { return _array.length == 0; } ref T front() { return _array[0]; } void popFront() { _array = _array[1 .. $]; } } return ArrayRange!T(array); } By implementing the iterator as struct, there's no need to allocate GC memory for a new instance. The only member is a slice of the source array, which again avoids allocation. Look at the implementation of popFront. Rather than requiring a separate variable to track the current array index, it slices the first element out of _array so that the next element is always at index 0, consequently shortening.length of the slice by 1 so that after every item has been consumed, _array.length will be 0. This makes the implementation of both empty and front dead simple. ArrayRange can be a Voldemort type because there is no need to declare its type in any algorithm it's passed to. As long as the algorithms are implemented as templates, the compiler can infer everything that needs to be known for them to work. Moreover, thanks to UFCS, it's possible to call this function as if it were an array property. Given an array called myArray, the following is valid: auto range = myArray.range; Next, we need a template to go in the other direction. This needs to allocate a new array, walk the iterator, and store the result of each call to element in the new array. Its implementation is as follows: T[] array(T, R)(R range) @property { T[] ret; while(!range.empty) { ret ~= range.front; range.popFront(); } return ret; } This can be called after any operation that produces any Range in order to get an array. If the range comes at the end of one or more lazy operations, this will cause all of them to execute simply by the call to popFront (we'll see how shortly). In that case, no allocations happen except as needed in this function when elements are appended to ret. Again, the appending strategy here is naïve, so there's room for improvement in order to reduce the potential number of allocations. Now it's time to implement an algorithm to make use of our new range interface. The implementation of filter The filter function isn't going to do any filtering at all. If that sounds counterintuitive, recall that we want the function to be lazy; all of the work should be delayed until it is actually needed. The way to accomplish that is to wrap the input range in a custom range that has an internal implementation of the filtering algorithm. We'll call this wrapper FilteredRange. It will be a Voldemort type, which is local to the filter function. Before seeing the entire implementation, it will help to examine it in pieces as there's a bit more to see here than with ArrayRange. FilteredRange has only one member: private R _source; R is the type of the range that is passed to filter. The empty and front functions simply delegate to the source range, so we'll look at popFront next: void popFront() { _source.popFront(); skipNext(); } This will always pop the front from the source range before running the filtering logic, which is implemented in the private helper function skipNext: private void skipNext() { while(!_source.empty && !unaryFun!predicate(_source.front)) _source.popFront(); } This function tests the result of _source.front against the predicate. If it doesn't match, the loop moves on to the next element, repeating the process until either a match is found or the source range is empty. So, imagine you have an array arr of the values [1,2,3,4]. Given what we've implemented so far, what would be the result of the following chain? Let's have a look at the following code: arr.range.filter!(x => (x & 1) == 0).front; As mentioned previously, front delegates to _source.front. In this case, the source range is an instance of ArrayRange; its front returns _source[0]. Since popFront was never called at any point, the first value in the array was never tested against the predicate. Therefore, the return value is 1, a value which doesn't match the predicate. The first value returned by front should be 2, since it's the first even number in the array. In order to make this behave as expected, FilteredRange needs to ensure the wrapped range is in a state such that either the first call to front will properly return a filtered value, or empty will return true, meaning there are no values in the source range that match the predicate. This is best done in the constructor: this(R source) { _source = source; skipNext(); } Calling skipNext in the constructor ensures that the first element of the source range is tested against the predicate; however, it does mean that our filter implementation isn't completely lazy. In an extreme case, that _source contains no values that match the predicate; it's actually going to be completely eager. The source elements will be consumed as soon as the range is instantiated. Not all algorithms will lend themselves to 100 percent laziness. No matter what we have here is lazy enough. Wrapped up inside the filter function, the whole thing looks like this: import std.functional; auto filter(alias predicate, R)(R source) if(is(typeof(unaryFun!predicate))) { struct FilteredRange { private R _source; this(R source) { _source = source; skipNext(); } bool empty() { return _source.empty; } auto ref front() { return _source.front; } void popFront() { _source.popFront(); skipNext(); } private void skipNext() { while(!_source.empty && !unaryFun!predicate(_source.front)) _source.popFront(); } } return FilteredRange(source); } It might be tempting to take the filtering logic out of the skipNext method and add it to front, which is another way to guarantee that it's performed on every element. Then no work would need to be done in the constructor and popFront would simply become a wrapper for _source.popFront. The problem with that approach is that front can potentially be called multiple times without calling popFront in between. Aside from the fact that it should return the same value each time, which can easily be accommodated, this still means the current element will be tested against the predicate on each call. That's unnecessary work. As a general rule, any work that needs to be done inside a range should happen as a result of calling popFront, leaving front to simply focus on returning the current element. The test With the implementation complete, it's time to put it through its paces. Here are a few test cases in a unittest block: unittest { auto arr = [10, 13, 300, 42, 121, 20, 33, 45, 50, 109, 18]; auto result = arr.range .filter!(x => x < 100 ) .filter!(x => (x & 1) == 0) .array!int(); assert(result == [10,42,20,50,18]); arr = [1,2,3,4,5,6]; result = arr.range.filter!(x => (x & 1) == 0).array!int; assert(result == [2, 4, 6]); arr = [1, 3, 5, 7]; auto r = arr.range.filter!(x => (x & 1) == 0); assert(r.empty); arr = [2,4,6,8]; result = arr.range.filter!(x => (x & 1) == 0).array!int; assert(result == arr); } Assuming all of this has been saved in a file called filter.d, the following will compile it for unit testing: dmd -unittest -main filter That should result in an executable called filter which, when executed, should print nothing to the screen, indicating a successful test run. Notice the test that calls empty directly on the returned range. Sometimes, we might not need to convert a range to a container at the end of the chain. For example, to print the results, it's quite reasonable to iterate a range directly. Why allocate when it isn't necessary? The real ranges The purpose of the preceding exercise was to get a feel of the motivation behind D ranges. We didn't develop a concrete type called Range, just an interface. D does the same, with a small set of interfaces defining ranges for different purposes. The interface we developed exactly corresponds to the basic kind of D range, called an input range, one of one of two foundational range interfaces in D (the upshot of that is that both ArrayRange and FilteredRange are valid input ranges, though, as we'll eventually see, there's no reason to use either outside of this article). There are also certain optional properties that ranges might have, which, when present, some algorithms might take advantage of. We'll take a brief look at the range interfaces now, then see more details regarding their usage in the next section. Input ranges This foundational range is defined to be anything from which data can be sequentially read via the three primitives empty, front, and popFront. The first two should be treated as properties, meaning they can be variables or functions. This is important to keep in mind when implementing any generic range-based algorithm yourself; calls to these two primitives should be made without parentheses. The three higher-order range interfaces, we'll see shortly, build upon the input range interface. To reinforce a point made earlier, one general rule to live by when crafting input ranges is that consecutive calls to front should return the same value until popFront is called; popFront prepares an element to be returned and front returns it. Breaking this rule can lead to unexpected consequences when working with range-based algorithms, or even foreach. Input ranges are somewhat special in that they are recognized by the compiler. The opApply enables iteration of a custom type with a foreach loop. An alternative is to provide an implementation of the input range primitives. When the compiler encounters a foreach loop, it first checks to see if the iterated instance is of a type that implements opApply. If not, it then checks for the input range interface and, if found, rewrites the loop. In a given range someRange, take for example the following loop: foreach(e; range) { ... } This is rewritten to something like this: for(auto __r = range; !__r.empty; __r.popFront()) { auto e = __r.front; ... } This has implications. To demonstrate, let's use the ArrayRange from earlier: auto ar = [1, 2, 3, 4, 5].range; foreach(n; ar) { writeln(n); } if(!ar.empty) writeln(ar.front); The last line prints 1. If you're surprised, look back up at the for loop that the compiler generates. ArrayRange is a struct, so when it's assigned to __r, a copy is generated. The slices inside, ar and __r, point to the same memory, but their .ptr and .length properties are distinct. As the length of the __r slice decreases, the length of the ar slice remains the same. When implementing generic algorithms that loop over a source range, it's not a good idea to assume the original range will not be consumed by the loop. If it's a class instead of struct, it will be consumed by the loop, as classes are references types. Furthermore, there are no guarantees about the internal implementation of a range. There could be struct-based ranges that are actually consumed in a foreach loop. Generic functions should always assume this is the case. Test if a given range type R is an input range: import std.range : isInputRange; static assert(isInputRange!R); There are no special requirements on the return value of the front property. Elements can be returned by value or by reference, they can be qualified or unqualified, they can be inferred via auto, and so on. Any qualifiers, storage classes, or attributes that can be applied to functions and their return values can be used with any range function, though it might not always make sense to do so. Forward ranges The most basic of the higher-order ranges is the forward range. This is defined as an input range that allows its current point of iteration to be saved via a primitive appropriately named save. Effectively, the implementation should return a copy of the current state of the range. For ranges that are struct types, it could be as simple as: auto save() { return this; } For ranges that are class types, it requires allocating a new instance: auto save() { return new MyForwardRange(this); } Forward ranges are useful for implementing algorithms that require a look ahead. For example, consider the case of searching a range for a pair of adjacent elements that pass an equality test: auto saved = r.save; if(!saved.empty) { for(saved.popFront(); !saved.empty; r.popFront(), saved.popFront()) { if(r.front == saved.front) return r; } } return saved; Because this uses a for loop and not a foreach loop, the ranges are iterated directly and are going to be consumed. Before the loop begins, a copy of the current state of the range is made by calling r.save. Then, iteration begins over both the copy and the original. The original range is positioned at the first element, and the call to saved.popFront in the beginning of the loop statement positions the saved range at the second element. As the ranges are iterated in unison, the comparison is always made on adjacent elements. If a match is found, r is returned, meaning that the returned range is positioned at the first element of a matching pair. If no match is found, saved is returned—since it's one element ahead of r, it will have been consumed completely and its empty property will be true. The preceding example is derived from a more generic implementation in Phobos, std.range.findAdjacent. It can use any binary (two argument) Boolean condition to test adjacent elements and is constrained to only accept forward ranges. It's important to understand that calling save usually does not mean a deep copy, but it sometimes can. If we were to add a save function to the ArrayRange from earlier, we could simply return this; the array elements would not be copied. A class-based range, on the other hand, will usually perform a deep copy because it's a reference type. When implementing generic functions, you should never make the assumption that the range does not require a deep copy. For example, given a range r: auto saved = r; // INCORRECT!! auto saved = r.save; // Correct. If r is a class, the first line is almost certainly going to result in incorrect behavior. It would in the preceding example loop. To test if a given range R is a forward range: import std.range : isForwardRange; static assert(isForwardRange!R); Bidirectional ranges A bidirectional range is a forward range that includes the primitives back and popBack, allowing a range to be sequentially iterated in reverse. The former should be a property, the latter a function. Given a bidirectional range r, the following forms of iteration are possible: foreach_reverse(e; r) writeln(e); for(; !r.empty; r.popBack) writeln(r.back); } Like its cousin foreach, the foreach_reverse loop will be rewritten into a for loop that does not consume the original range; the for loop shown here does consume it. Test whether a given range type R is a bidirectional range: import std.range : isBidirectionalRange; static assert(isBidirectionalRange!R); Random-access ranges A random-access range is a bidirectional range that supports indexing and is required to provide a length primitive unless it's infinite (two topics we'll discuss shortly). For custom range types, this is achieved via the opIndex operator overload. It is assumed that r[n] returns a reference to the (n+1)th element of the range, just as when indexing an array. Test whether a given range R is a random-access range: import std.range : isRandomAccessRange; static assert(isRandomAccessRange!R); Dynamic arrays can be treated as random-access ranges by importing std.array. This pulls functions into scope that accept dynamic arrays as parameters and allows them to pass all the isRandomAccessRange checks. This makes our ArrayRange from earlier obsolete. Often, when you need a random-access range, it's sufficient just to use an array instead of creating a new range type. However, char and wchar arrays (string and wstring) are not considered random-access ranges, so they will not work with any algorithm that requires one. Getting a random-access range from char[] and wchar[] Recall that a single Unicode character can be composed of multiple elements in a char or wchar array, which is an aspect of strings that would seriously complicate any algorithm implementation that needs to directly index the array. To get around this, the thing to do in a general case is to convert char[] and wchar[] into dchar[]. This can be done with std.utf.toUTF32, which encodes UTF-8 and UTF-16 strings into UTF-32 strings. Alternatively, if you know you're only working with ASCII characters, you can use std.string.representation to get ubyte[] or ushort[] (on dstring, it returns uint[]). Output ranges The output range is the second foundational range type. It's defined to be anything that can be sequentially written to via the primitive put. Generally, it should be implemented to accept a single parameter, but the parameter could be a single element, an array of elements, a pointer to elements, or another data structure containing elements. When working with output ranges, never call the range's implementation of put directly; instead, use the Phobos' utility function std.range.put. It will call the range's implementation internally, but it allows for a wider range of argument types. Given a range r and element e, it would look like this: import std.range : put; put(r, e); The benefit here is if e is anything other than a single element, such as an array or another range, the global put does what is necessary to pull elements from it and put them into r one at a time. With this, you can define and implement a simple output range that might look something like this: MyOutputRange(T) { private T[] _elements; void put(T elem) { _elements ~= elem; } } Now, you need not worry about calling put in a loop, or overloading it to accept collections of T. For example, let's have a look at the following code: MyOutputRange!int range; auto nums = [11, 22, 33, 44, 55]; import std.range : put; put(range, nums); Note that using UFCS here will cause compilation to fail, as the compiler will attempt to call MyOutputRange.put directly, rather than the utility function. However, it's fine to use UFCS when the first parameter is a dynamic array. This allows arrays to pass the isOutputRange predicate Test whether a given range R is an output range: import std.range : isOutputRange; static assert(isOutputRange!(R, E)); Here, E is the type of element accepted by R.put. Optional range primitives In addition to the five primary range types, some algorithms in Phobos are designed to look for optional primitives that can be used as an optimization or, in some cases, a requirement. There are predicate templates in std.range that allow the same options to be used outside of Phobos. hasLength Ranges that expose a length property can reduce the amount of work needed to determine the number of elements they contain. A great example is the std.range.walkLength function, which will determine and return the length of any range, whether it has a length primitive or not. Given a range that satisfies the std.range.hasLength predicate, the operation becomes a call to the length property; otherwise, the range must be iterated until it is consumed, incrementing a variable every time popFront is called. Generally, length is expected to be a O(1) operation. If any given implementation cannot meet that expectation, it should be clearly documented as such. For non-infinite random-access ranges, length is a requirement. For all others, it's optional. isInfinite An input range with an empty property, which is implemented as a compile-time value set to false is considered an infinite range. For example, let's have a look at the following code: struct IR { private uint _number; enum empty = false; auto front() { return _number; } void popFront() { ++_number; } } Here, empty is a manifest constant, but it could alternatively be implemented as follows: static immutable empty = false; The predicate template std.range.isInfinite can be used to identify infinite ranges. Any range that is always going to return false from empty should be implemented to pass isInfinite. Wrapper ranges (such as the FilterRange we implemented earlier) in some functions might check isInfinite and customize an algorithm's behavior when it's true. Simply returning false from an empty function will break this, potentially leading to infinite loops or other undesired behavior. Other options There are a handful of other optional primitives and behaviors, as follows: hasSlicing: This returns true for any forward range that supports slicing. There are a set of requirements specified by the documentation for finite versus infinite ranges and whether opDollar is implemented. hasMobileElements: This is true for any input range whose elements can be moved around in the memory (as opposed to copied) via the primitives moveFront, moveBack, and moveAt. hasSwappableElements: This returns true if a range supports swapping elements through its interface. The requirements are different depending on the range type. hasAssignableElements: This returns true if elements are assignable through range primitives such as front, back, or opIndex. At http://dlang.org/phobos/std_range_primitives.html, you can find specific documentation for all of these tests, including any special requirements that must be implemented by a range type to satisfy them. Ranges in use The key concept to understand ranges in the general case is that, unless they are infinite, they are consumable. In idiomatic usage, they aren't intended to be kept around, adding and removing elements to and from them as if they were some sort of container. A range is generally created only when needed, passed to an algorithm as input, then ultimately consumed, often at the end of a chain of algorithms. Even forward ranges and output ranges with their save and put primitives usually aren't intended to live long beyond an algorithm. That's not to say it's forbidden to keep a range around; some might even be designed for long life. For example, the random number generators in std.random are all ranges that are intended to be reused. However, idiomatic usage in D generally means lazy, fire-and-forget ranges that allow algorithms to operate on data from any source. For most programs, the need to deal with ranges directly should be rare; most code will be passing ranges to algorithms, then either converting the result to a container or iterating it with a foreach loop. Only when implementing custom containers and range-based algorithms is it necessary to implement a range or call a range interface directly. Still, understanding what's going on under the hood helps in understanding the algorithms in Phobos, even if you never need to implement a range or algorithm yourself. That's the focus of the remainder of this article. Custom ranges When implementing custom ranges, some thought should be given to the primitives that need to be supported and how to implement them. Since arrays support a number of primitives out of the box, it might be tempting to return a slice from a custom type, rather than struct wrapping an array or something else. While that might be desirable in some cases, keep in mind that a slice is also an output range and has assignable elements (unless it's qualified as const or immutable, but those can be cast away). In many cases, what's really wanted is an input range that can never be modified; one that allows iteration and prevents unwanted allocations. A custom range should be as lightweight as possible. If a container uses an array or pointer internally, the range should operate on a slice of the array, or a copy of the pointer, rather than a copy of the data. This is especially true for the save primitive of a forward iterator; it could be called more than once in a chain of algorithms, so an implementation that requires deep copying would be extremely suboptimal (not to mention problematic for a range that requires ref return values from front). Now we're going to implement two actual ranges that demonstrate two different scenarios. One is intended to be a one-off range used to iterate a container, and one is suited to sticking around for as long as needed. Both can be used with any of the algorithms and range operations in Phobos. Getting a range from a stack Here's a barebones, simple stack implementation with the common operations push, pop, top, and isEmpty (named to avoid confusion with the input range interface). It uses an array to store its elements, appending them in the push operation and decreasing the array length in the pop operation. The top of the stack is always _array[$-1]: struct Stack(T) { private T[] _array; void push(T element) { _array ~= element; } void pop() { assert(!isEmpty); _array.length -= 1; } ref T top() { assert(!isEmpty); return _array[$-1]; } bool isEmpty() { return _array.length == 0; } } Rather than adding an opApply to iterate a stack directly, we want to create a range to do the job for us so that we can use it with all of those algorithms. Additionally, we don't want the stack to be modified through the range interface, so we should declare a new range type internally. That might look like this: private struct Range { T[] _elements; bool empty() { return _elements.length == 0; } T front() { return _elements[$-1]; } void popFront() { _elements.length -= 1; } } Add this anywhere you'd like inside the Stack declaration. Note the iteration of popFront. Effectively, this range will iterate the elements backwards. Since the end of the array is the top of the stack, that means it's iterating the stack from the top to the bottom. We could also add back and popBack primitives that iterate from the bottom to the top, but we'd also have to add a save primitive since bidirectional ranges must also be forward ranges. Now, all we need is a function to return a Range instance: auto elements() { return Range(_array); } Again, add this anywhere inside the Stack declaration. A real implementation might also add the ability to get a range instance from slicing a stack. Now, test it out: Stack!int stack; foreach(i; 0..10) stack.push(i); writeln("Iterating..."); foreach(i; stack.elements) writeln(i); stack.pop(); stack.pop(); writeln("Iterating..."); foreach(i; stack.elements) writeln(i); One of the great side effects of this sort of range implementation is that you can modify the container behind the range's back and the range doesn't care: foreach(i; stack.elements) { stack.pop(); writeln(i); } writeln(stack.top); This will still print exactly what was in the stack at the time the range was created, but the writeln outside the loop will cause an assertion failure because the stack will be empty by then. Of course, it's still possible to implement a container that can cause its ranges not just to become stale, but to become unstable and lead to an array bounds error or an access violation or some such. However, D's slices used in conjunction with structs give a good deal of flexibility. A name generator range Imagine that we're working on a game and need to generate fictional names. For this example, let's say it's a music group simulator and the names are those of group members. We'll need a data structure to hold the list of possible names. To keep the example simple, we'll implement one that holds both first and last names: struct NameList { private: string[] _firstNames; string[] _lastNames; struct Generator { private string[] _first; private string[] _last; private string _next; enum empty = false; this(string[] first, string[] last) { _first = first; _last = last; popFront(); } string front() { return _next; } void popFront() { import std.random : uniform; auto firstIdx = uniform(0, _first.length); auto lastIdx = uniform(0, _last.length); _next = _first[firstIdx] ~ " " ~ _last[lastIdx]; } } public: auto generator() { return Generator(_firstNames, _lastNames); } } The custom range is in the highlighted block. It's a struct called Generator that stores two slices, _first and _last, which are both initialized in its only constructor. It also has a field called _next, which we'll come back to in a minute. The goal of the range is to provide an endless stream of randomly generated names, which means it doesn't make sense for its empty property to ever return true. As such, it is marked as an infinite range by the manifest constant implementation of empty that is set to false. This range has a constructor because it needs to do a little work to prepare itself before front is called for the first time. All of the work is done in popFront, which the constructor calls after the member variables are set up. Inside popFront, you can see that we're using the std.random.uniform function. By default, this function uses a global random number generator and returns a value in the range specified by the parameters, in this case 0 and the length of each array. The first parameter is inclusive and the second is exclusive. Two random numbers are generated, one for each array, and then used to combine a first name and a last name to store in the _next member, which is the value returned when front is called. Remember, consecutive calls to front without any calls to popFront should always return the same value. std.random.uniform can be configured to use any instance of one of the random number generator implementations in Phobos. It can also be configured to treat the bounds differently. For example, both could be inclusive, exclusive, or the reverse of the default. See the documentation at http://dlang.org/phobos/std_random.html for details. The generator property of NameList returns an instance of Generator. Presumably, the names in a NameList would be loaded from a file on disk, or a database, or perhaps even imported at compile-time. It's perfectly fine to keep a single Generator instance handy for the life of the program as implemented. However, if the NameList instance backing the range supported reloading or appending, not all changes would be reflected in the range. In that scenario, it's better to go through generator every time new names need to be generated. Now, let's see how our custom range might be used: auto nameList = NameList( ["George", "John", "Paul", "Ringo", "Bob", "Jimi", "Waylon", "Willie", "Johnny", "Kris", "Frank", "Dean", "Anne", "Nancy", "Joan", "Lita", "Janice", "Pat", "Dionne", "Whitney", "Donna", "Diana"], ["Harrison", "Jones", "Lennon", "Denver", "McCartney", "Simon", "Starr", "Marley", "Dylan", "Hendrix", "Jennings", "Nelson", "Cash", "Mathis", "Kristofferson", "Sinatra", "Martin", "Wilson", "Jett", "Baez", "Ford", "Joplin", "Benatar", "Boone", "Warwick", "Houston", "Sommers", "Ross"] ); import std.range : take; auto names = nameList.generator.take(4); writeln("These artists want to form a new band:"); foreach(artist; names) writeln(artist); First up, we initialize a NameList instance with two array literals, one of first names and one of last names. Next, the highlighted line is where the range is used. We call nameList.generator and then, using UFCS, pass the returned Generator instance to std.range.take. This function creates a new lazy range containing a number of elements, four in this case, from the source range. In other words, the result is the equivalent of calling front and popFront four times on the range returned from nameList.generator, but since it's lazy, the popping doesn't occur until the foreach loop. That loop produces four randomly generated names that are each written to standard output. One iteration yielded the following names for me: These artists want to form a new band: Dionne WilsonJohnny StarrRingo SinatraDean Kristofferson Other considerations The Generator range is infinite, so it doesn't need length. There should never be a need to index it, iterate it in reverse, or assign any values to it. It has exactly the interface it needs. But it's not always so obvious where to draw the line when implementing a custom range. Consider the interface for a range from a queue data structure. A basic queue implementation allows two operations to add and remove items—enqueue and dequeue (or push and pop if you prefer). It provides the self-describing properties empty and length. What sort of interface should a range from a queue implement? An input range with a length property is perhaps the most obvious, reflecting the interface of the queue itself. Would it make sense to add a save property? Should it also be a bidirectional range? What about indexing? Should the range be random-access? There are queue implementations out there in different languages that allow indexing, either through an operator overload or a function such as getElementAt. Does that make sense? Maybe. More importantly, if a queue doesn't allow indexing, does it make sense for a range produced from that queue to allow it? What about slicing? Or assignable elements? For our queue type at least, there are no clear-cut answers to these questions. A variety of factors come into play when choosing which range primitives to implement, including the internal data structure used to implement the queue, the complexity requirements of the primitives involved (indexing should be an O(1) operation), whether the queue was implemented to meet a specific need or is a more general-purpose data structure, and so on. A good rule of thumb is that if a range can be made a forward range, then it should be. Custom algorithms When implementing custom, range-based algorithms, it's not enough to just drop an input range interface onto the returned range type and be done with it. Some consideration needs to be given to the type of range used as input to the function and how its interface should affect the interface of the returned range. Consider the FilteredRange we implemented earlier, which provides the minimal input range interface. Given that it's a wrapper range, what happens when the source range is an infinite range? Let's look at it step by step. First, an infinite range is passed in to filter. Next, it's wrapped up in a FilteredRange instance that's returned from the function. The returned range is going to be iterated at some point, either directly by the caller or somewhere in a chain of algorithms. There's one problem, though: with a source range that's infinite, the FilteredRange instance can never be consumed. Because its empty property simply wraps that of the source range, it's always going to return false if the source range is infinite. However, since FilteredRange does not implement empty as a compile-time constant, it will never match the isInfiniteRange predicate. This will cause any algorithm that makes that check to assume it's dealing with a finite range and, if iterating it, enter into an infinite loop. Imagine trying to track down that bug. One option is to prohibit infinite ranges with a template constraint, but that's too restrictive. The better way around this potential problem is to check the source range against the isInfinite predicate inside the FilteredRange implementation. Then, the appropriate form of the empty primitive of FilteredRange can be configured with conditional compilation: import std.range : isInfinite; static if(isInfinite!T) enum empty = false; else bool empty(){ return _source.empty; } With this, FilteredRange will satisfy the isInfinite predicate when it wraps an infinite range, avoiding the infinite loop bug. Another good rule of thumb is that a wrapper range should implement as many of the primitives provided by the source range as it reasonably can. If the range returned by a function has fewer primitives than the one that went in, it is usable with fewer algorithms. But not all ranges can accommodate every primitive. Take FilteredRange as an example again. It could be configured to support the bidirectional interface, but that would have a bit of a performance impact as the constructor would have to find the last element in the source range that satisfies the predicate in addition to finding the first, so that both front and back are primed to return the correct values. Rather than using conditional compilation, std.algorithm provides two functions, filter and filterBidirectional, so that users must explicitly choose to use the latter version. A bidirectional range passed to filter will produce a forward range, but the latter maintains the interface. The random-access interface, on the other hand, makes no sense on FilteredRange. Any value taken from the range must satisfy the predicate, but if users can randomly index the range, they could quite easily get values that don't satisfy the predicate. It could work if the range was made eager rather than lazy. In that case, it would allocate new storage and copy all the elements from the source that satisfies the predicate, but that defeats the purpose of using lazy ranges in the first place. Summary In this article, we've taken an introductory look at ranges in D and how to implement them into containers and algorithms. For more information on ranges and their primitives and traits, see the documentation at http://dlang.org/phobos/std_range.html. Resources for Article: Further resources on this subject: Transactions and Operators [article] Using Protocols and Protocol Extensions [article] Hand Gesture Recognition Using a Kinect Depth Sensor [article]

0
0
2030

article-image-creating-graph-application-python-neo4j-gephi-linkuriousjs

Greg Roberts

12 Oct 2015

13 min read

Creating a graph application with Python, Neo4j, Gephi & Linkurious.js

Greg Roberts

12 Oct 2015

13 min read

I love Python, and to celebrate Packt Python week, I’ve spent some time developing an app using some of my favorite tools. The app is a graph visualization of Python and related topics, as well as showing where all our content fits in. The topics are all StackOverflow tags, related by their co-occurrence in questions on the site. The app is available to view at http://gregroberts.github.io/ and in this blog, I’m going to discuss some of the techniques I used to construct the underlying dataset, and how I turned it into an online application. Graphs, not charts Graphs are an incredibly powerful tool for analyzing and visualizing complex data. In recent years, many different graph database engines have been developed to make use of this novel manner of representing data. These databases offer many benefits over traditional, relational databases because of how the data is stored and accessed. Here at Packt, I use a Neo4j graph to store and analyze data about our business. Using the Cypher query language, it’s easy to express complicated relations between different nodes succinctly. It’s not just the technical aspect of graphs which make them appealing to work with. Seeing the connections between bits of data visualized explicitly as in a graph helps you to see the data in a different light, and make connections that you might not have spotted otherwise. This graph has many uses at Packt, from customer segmentation to product recommendations. In the next section, I describe the process I use to generate recommendations from the database. Make the connection For product recommendations, I use what’s known as a hybrid filter. This considers both content based filtering (product x and y are about the same topic) and collaborative filtering (people who bought x also bought y). Each of these methods has strengths and weaknesses, so combining them into one algorithm provides a more accurate signal. The collaborative aspect is straightforward to implement in Cypher. For a particular product, we want to find out which other product is most frequently bought alongside it. We have all our products and customers stored as nodes, and purchases are stored as edges. Thus, the Cypher query we want looks like this: MATCH (n:Product {title:’Learning Cypher’})-[r:purchased*2]-(m:Product) WITH m.title AS suggestion, count(distinct r)/(n.purchased+m.purchased) AS alsoBought WHERE m<>n RETURN* ORDER BY alsoBought DESC and will very efficiently return the most commonly also purchased product. When calculating the weight, we divide by the total units sold of both titles, so we get a proportion returned. We do this so we don’t just get the titles with the most units; we’re effectively calculating the size of the intersection of the two titles’ audiences relative to their overall audience size. The content side of the algorithm looks very similar: MATCH (n:Product {title:’Learning Cypher’})-[r:is_about*2]-(m:Product) WITH m.title AS suggestion, count(distinct r)/(length(n.topics)+length(m.topics)) AS alsoAbout WHERE m<>n RETURN * ORDER BY alsoAbout DESC Implicit in this algorithm is knowledge that a title is_about a topic of some kind. This could be done manually, but where’s the fun in that? In Packt’s domain there already exists a huge, well moderated corpus of technology concepts and their usage: StackOverflow. The tagging system on StackOverflow not only tells us about all the topics developers across the world are using, it also tells us how those topics are related, by looking at the co-occurrence of tags in questions. So in our graph, StackOverflow tags are nodes in their own right, which represent topics. These nodes are connected via edges, which are weighted to reflect their co-occurrence on StackOverflow: edge_weight(n,m) = (Number of questions tagged with both n & m)/(Number questions tagged with n or m) So, to find topics related to a given topic, we could execute a query like this: MATCH (n:StackOverflowTag {name:'Matplotlib'})-[r:related_to]-(m:StackOverflowTag) RETURN n.name, r.weight, m.name ORDER BY r.weight DESC LIMIT 10 Which would return the following: | n.name | r.weight | m.name ----+------------+----------+-------------------- 1 | Matplotlib | 0.065699 | Plot 2 | Matplotlib | 0.045678 | Numpy 3 | Matplotlib | 0.029667 | Pandas 4 | Matplotlib | 0.023623 | Python 5 | Matplotlib | 0.023051 | Scipy 6 | Matplotlib | 0.017413 | Histogram 7 | Matplotlib | 0.015618 | Ipython 8 | Matplotlib | 0.013761 | Matplotlib Basemap 9 | Matplotlib | 0.013207 | Python 2.7 10 | Matplotlib | 0.012982 | Legend There are many, more complex relationships you can define between topics like this, too. You can infer directionality in the relationship by looking at the local network, or you could start constructing Hyper graphs using the extensive StackExchange API. So we have our topics, but we still need to connect our content to topics. To do this, I’ve used a two stage process. Step 1 – Parsing out the topics We take all the copy (words) pertaining to a particular product as a document representing that product. This includes the title, chapter headings, and all the copy on the website. We use this because it’s already been optimized for search, and should thus carry a fair representation of what the title is about. We then parse this document and keep all the words which match the topics we’ve previously imported. #...code for fetching all the copy for all the products key_re = 'W(%s)W' % '|'.join(re.escape(i) for i in topic_keywords) for i in documents tags = re.findall(key_re, i[‘copy’]) i['tags'] = map(lambda x: tag_lookup[x],tags) Having done this for each product, we have a bag of words representing each product, where each word is a recognized topic. Step 2 – Finding the information From each of these documents, we want to know the topics which are most important for that document. To do this, we use the tf-idf algorithm. tf-idf stands for term frequency, inverse document frequency. The algorithm takes the number of times a term appears in a particular document, and divides it by the proportion of the documents that word appears in. The term frequency factor boosts terms which appear often in a document, whilst the inverse document frequency factor gets rid of terms which are overly common across the entire corpus (for example, the term ‘programming’ is common in our product copy, and whilst most of the documents ARE about programming, this doesn’t provide much discriminating information about each document). To do all of this, I use python (obviously) and the excellent scikit-learn library. Tf-idf is implemented in the class sklearn.feature_extraction.text.TfidfVectorizer. This class has lots of options you can fiddle with to get more informative results. import sklearn.feature_extraction.text as skt tagger = skt.TfidfVectorizer(input = 'content', encoding = 'utf-8', decode_error = 'replace', strip_accents = None, analyzer = lambda x: x, ngram_range = (1,1), max_df = 0.8, min_df = 0.0, norm = 'l2', sublinear_tf = False) It’s a good idea to use the min_df & max_df arguments of the constructor so as to cut out the most common/obtuse words, to get a more informative weighting. The ‘analyzer’ argument tells it how to get the words from each document, in our case, the documents are already lists of normalized words, so we don’t need anything additional done. #create vectors of all the documents vectors = tagger.fit_transform(map(lambda x: x['tags'],rows)).toarray() #get back the topic names to map to the graph t_map = tagger.get_feature_names() jobs = [] for ind, vec in enumerate(vectors): features = filter(lambda x: x[1]>0, zip(t_map,vec)) doc = documents[ind] for topic, weight in features: job = ‘’’MERGE (n:StackOverflowTag {name:’%s’}) MERGE (m:Product {id:’%s’}) CREATE UNIQUE (m)-[:is_about {source:’tf_idf’,weight:%d}]-(n) ’’’ % (topic, doc[‘id’], weight) jobs.append(job) We then execute all of the jobs using Py2neo’s Batch functionality. Having done all of this, we can now relate products to each other in terms of what topics they have in common: MATCH (n:Product {isbn10:'1783988363'})-[r:is_about]-(a)-[q:is_about]-(m:Product {isbn10:'1783289007'}) WITH a.name as topic, r.weight+q.weight AS weight RETURN topic ORDER BY weight DESC limit 6 Which returns: | topic ---+------------------ 1 | Machine Learning 2 | Image 3 | Models 4 | Algorithm 5 | Data 6 | Python Huzzah! I now have a graph into which I can throw any piece of content about programming or software, and it will fit nicely into the network of topics we’ve developed. Take a breath So, that’s how the graph came to be. To communicate with Neo4j from Python, I use the excellent py2neo module, developed by Nigel Small. This module has all sorts of handy abstractions to allow you to work with nodes and edges as native Python objects, and then update your Neo instance with any changes you’ve made. The graph I’ve spoken about is used for many purposes across the business, and has grown in size and scope significantly over the last year. For this project, I’ve taken from this graph everything relevant to Python. I started by getting all of our content which is_about Python, or about a topic related to python: titles = [i.n for i in graph.cypher.execute('''MATCH (n)-[r:is_about]-(m:StackOverflowTag {name:'Python'}) return distinct n''')] t2 = [i.n for i in graph.cypher.execute('''MATCH (n)-[r:is_about]-(m:StackOverflowTag)-[:related_to]-(m:StackOverflowTag {name:'Python'}) where has(n.name) return distinct n''')] titles.extend(t2) then hydrated this further by going one or two hops down each path in various directions, to get a large set of topics and content related to Python. Visualising the graph Since I started working with graphs, two visualisation tools I’ve always used are Gephi and Sigma.js. Gephi is a great solution for analysing and exploring graphical data, allowing you to apply a plethora of different layout options, find out more about the statistics of the network, and to filter and change how the graph is displayed. Sigma.js is a lightweight JavaScript library which allows you to publish beautiful graph visualizations in a browser, and it copes very well with even very large graphs. Gephi has a great plugin which allows you to export your graph straight into a web page which you can host, share and adapt. More recently, Linkurious have made it their mission to bring graph visualization to the masses. I highly advise trying the demo of their product. It really shows how much value it’s possible to get out of graph based data. Imagine if your Customer Relations team were able to do a single query to view the entire history of a case or customer, laid out as a beautiful graph, full of glyphs and annotations. Linkurious have built their product on top of Sigma.js, and they’ve made available much of the work they’ve done as the open source Linkurious.js. This is essentially Sigma.js, with a few changes to the API, and an even greater variety of plugins. On Github, each plugin has an API page in the wiki and a downloadable demo. It’s worth cloning the repository just to see the things it’s capable of! Publish It! So here’s the workflow I used to get the Python topic graph out of Neo4j and onto the web. Use Py2neo to graph the subgraph of content and topics pertinent to Python, as described above Add to this some other topics linked to the same books to give a fuller picture of the Python “world” Add in topic-topic edges and product-product edges to show the full breadth of connections observed in the data Export all the nodes and edges to csv files Import node and edge tables into Gephi. The reason I’m using Gephi as a middle step is so that I can fiddle with the visualisation in Gephi until it looks perfect. The layout plugin in Sigma is good, but this way the graph is presentable as soon as the page loads, the communities are much clearer, and I’m not putting undue strain on browsers across the world! The layout of the graph has been achieved using a number of plugins. Instead of using the pre-installed ForceAtlas layouts, I’ve used the OpenOrd layout, which I feel really shows off the communities of a large graph. There’s a really interesting and technical presentation about how this layout works here. Export the graph into gexf format, having applied some partition and ranking functions to make it more clear and appealing. Now it’s all down to Linkurious and its various plugins! You can explore the source code of the final page to see all the details, but here I’ll give an overview of the different plugins I’ve used for the different parts of the visualisation: First instantiate the graph object, pointing to a container (note the CSS of the container, without this, the graph won’t display properly: <style type="text/css"> #container { max-width: 1500px; height: 850px; margin: auto; background-color: #E5E5E5; } </style> … <div id="container"></div> … <script> s= new sigma({ container: 'container', renderer: { container: document.getElementById('container'), type: 'canvas' }, settings: { … } }); sigma.parsers.gexf - used for (trivially!) importing a gexf file into a sigma instance sigma.parsers.gexf( 'static/data/Graph1.gexf', s, function(s) { //callback executed once the data is loaded, use this to set up any aspects of the app which depend on the data }); sigma.plugins.filter - Adds the ability to very simply hide nodes/edges based on a callback function which returns a boolean. This powers the filtering widgets on the page. <input class="form-control" id="min-degree" type="range" min="0" max="0" value="0"> … function applyMinDegreeFilter(e) { var v = e.target.value; $('#min-degree-val').textContent = v; filter .undo('min-degree') .nodesBy( function(n, options) { return this.graph.degree(n.id) >= options.minDegreeVal; },{ minDegreeVal: +v }, 'min-degree' ) .apply(); }; $('#min-degree').change(applyMinDegreeFilter); sigma.plugins.locate - Adds the ability to zoom in on a single node or collection of nodes. Very useful if you’re filtering a very large initial graph function locateNode (nid) { if (nid == '') { locate.center(1); } else { locate.nodes(nid); } }; sigma.renderers.glyphs - Allows you to add custom glyphs to each node. Useful if you have many types of node. Outro This application has been a very fun little project to build. The improvements to Sigma wrought by Linkurious have resulted in an incredibly powerful toolkit to rapidly generate graph based applications with a great degree of flexibility and interaction potential. None of this would have been possible were it not for Python. Python is my right (left, I’m left handed) hand which I use for almost everything. Its versatility and expressiveness make it an incredibly robust Swiss army knife in any data-analysts toolkit.

0
0
8150

article-image-configuring-and-securing-virtual-private-cloud

Packt

16 Sep 2015

7 min read

Configuring and Securing a Virtual Private Cloud

Packt

16 Sep 2015

7 min read

0
0
984

How-To Tutorials - Languages

Delphi Cookbook

Practical Big Data Exploration with Spark and Python

Expert Python Programming: Interfaces

Testing Your Application with cljs.test

How is Python code organized

Unlocking the JavaScript Core

Modular Programming in ECMAScript 6

Unleashing Creativity with MIT App Inventor 2

Using JavaScript with HTML

Cython Won't Bite

Trending Topics

Parallelization using Reducers

HTML5 APIs

Understanding Ranges

Creating a graph application with Python, Neo4j, Gephi & Linkurious.js

Configuring and Securing a Virtual Private Cloud