Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon

How-To Tutorials

7019 Articles
article-image-how-to-learn-data-science-from-data-mining-to-machine-learning
Richard Gall
04 Sep 2019
6 min read
Save for later

How to learn data science: from data mining to machine learning

Richard Gall
04 Sep 2019
6 min read
Data science is a field that’s complex and diverse. If you’re trying to learn data science and become a data scientist it can be easy to fall down a rabbit hole of machine learning or data processing. To a certain extent, that’s good. To be an effective data scientist you need to be curious. You need to be prepared to take on a range of different tasks and challenges. But that’s not always that efficient: if you want to learn quickly and effectively, you need a clear structure - a curriculum - that you can follow. This post will show you what you need to learn and how to go about it. Statistics Statistics is arguably the cornerstone of data science. Nate Silver called data scientists “sexed up statisticians”, a comment that was perhaps unfair but still nevertheless contains a kernel of truth in it: that data scientists are always working in the domain of statistics. Once you understand this everything else you need to learn will follow easily. Machine learning, data manipulation, data visualization - these are all ultimately technological methods for performing statistical analysis really well. Best Packt books and videos content for learning statistics Statistics for Data Science R Statistics Cookbook Statistical Methods and Applied Mathematics in Data Science [Video] Before you go any deeper into data science, it’s critical that you gain a solid foundation in statistics. Data mining and wrangling This is an important element of data science that often gets overlooked with all the hype about machine learning. However, without effective data collection and cleaning, all your efforts elsewhere are going to be pointless at best. At worst they might even be misleading or problematic. Sometimes called data manipulation or data munging, it's really all about managing and cleaning data from different sources so it can be used for analytics projects. To do it well you need to have a clear sense of where you want to get to - do you need to restructure the data? Sort or remove certain parts of a data set? Once you understand this, it’s much easier to wrangle data effectively. Data mining and wrangling tools There are a number of different tools you can use for data wrangling. Python and R are the two key programming languages, and both have some useful tools for data mining and manipulation. Python in particular has a great range of tools for data mining and wrangling, such as pandas and NLTK (Natural Language Toolkit), but that isn’t to say R isn’t powerful in this domain. Other tools are available too - Weka and Apache Mahout, for example, are popular. Weka is written in Java so is a good option if you have experience with that programming language, while Mahout integrates well with the Hadoop ecosystem. Data mining and data wrangling books and videos If you need to learn data mining, wrangling and manipulation, Packt has a range of products. Here are some of the best: Data Wrangling with R Data Wrangling with Python Python Data Mining Quick Start Guide Machine Learning for Data Mining Machine learning and artificial intelligence Although Machine learning and artificial intelligence are huge trends in their own right, they are nevertheless closely aligned with data science. Indeed, you might even say that their prominence today has grown out of the excitement around data science that we first we witnessed just under a decade ago. It’s a data scientist’s job to use machine learning and artificial intelligence in a way that can drive business value. That could, for example, be to recommend products or services to customers, perhaps to gain a better understanding into existing products, or even to better manage strategic and financial risks through predictive modelling. So, while we can see machine learning in a massive range of digital products and platforms - all of which require smart development and design - for it to work successfully, it needs to be supported by a capable and creative data scientist. Machine learning and artificial intelligence books for data scientists Machine Learning Algorithms Machine Learning with R - Third Edition Machine Learning with Apache Spark Quick Start Guide Machine Learning with TensorFlow 1.x Keras Deep Learning Cookbook Data visualization A talented data scientist isn’t just a great statistician and engineer, they’re also a great communicator. This means so-called soft skills are highly valuable - the ability to communicate insights and ideas with key stakeholders is essential. But great communication isn’t just about soft skills, it’s also about data visualization. Data visualization is, at a fundamental level, about organizing and presenting data in a way that tells a story, clarifies a problem, or illustrates a solution. It’s essential that you don’t overlook this step. Indeed, spending time learning about effective data visualization can also help you to develop your soft skills. The principles behind storytelling and communication through visualization are, in truth, exactly the same when applied to other scenarios. Data visualization tools There are a huge range of data visualization tools available. As with machine learning, understanding the differences between them and working out what solution will work for you is actually an important part of the learning process. For that reason, don’t be afraid to spend a little bit of time with a range of data visualization tools. Many of the most popular data visualization tools are paid for products. Perhaps the best known of these is Tableau (which, incidentally was bought by Salesforce earlier this year). Tableau and its competitors are very user friendly, which means the barrier to entry is pretty low. They allow you to create some pretty sophisticated data visualizations fairly easily. However, sticking to these tools is not only expensive, it can also limit your abilities. We’d recommend trying a number of different data visualization tools, such as Seabor, D3.js, Matplotlib, and ggplot2. Data visualization books and videos for data scientists Applied Data Visualization with R and ggplot2 Tableau 2019.1 for Data Scientists [Video] D3.js Data Visualization Projects [Video] Tableau in 7 Steps [Video] Data Visualization with Python If you want to learn data science, just get started! As we've seen, data science requires a number of very different skills and takes in a huge breadth of tools. That means that if you're going to be a data scientist, you need to be prepared to commit to learning forver: you're never going to reach a point where you know everything. While that might sound intimidating, it's important to have confidence. With a sense of direction and purpose, and a learning structure that works for you, it's possible to develop and build your data science capabilities in a way that could unlock new opportunities and act as the basis for some really exciting projects.
Read more
  • 0
  • 0
  • 5007

article-image-whats-new-in-usb4-transfer-speeds-of-upto-40gb-second-with-thunderbolt-3-and-more
Sugandha Lahoti
04 Sep 2019
3 min read
Save for later

What’s new in USB4? Transfer speeds of upto 40GB/second with Thunderbolt 3 and more

Sugandha Lahoti
04 Sep 2019
3 min read
USB4 technical specifications were published yesterday. Along with removing space in stylization (USB4 instead of USB 4), the new version offers double the speed of it’s previous versions. USB4 architecture is based on Intel’s Thunderbolt; Intel had provided Thunderbolt 3 to the USB Promoter Group royalty-free earlier this year. Features of USB4 USB4 runs blazingly fast by incorporating Intel's Thunderbolt technology. It allows transfers at the rate of 40 gigabits per second, twice the speed of the latest version of USB 3 and 8 times the speed of the original USB 3 standard. 40Gbps speeds, can example, allow users to do things like connect two 4K monitors at once, or run high-end external GPUs with ease. Key characteristics as specified in the USB4 specification include: Two-lane operation using existing USB Type-C cables and up to 40 Gbps operation over 40 Gbps certified cables Multiple data and display protocols to efficiently share the total available bandwidth over the bus Backward compatibility with USB 3.2, USB 2.0 and Thunderbolt 3 Another good news is that USB4 will use the same USB-C connector design as USB 3, which means manufacturers will not need to introduce new USB4 ports into their devices. Why USB4 omits a space The change in stylization was to simplify things. In an interview with Tom’s Hardware, USB Promoter Group CEO Brad Saunders said this is to prevent the profusion of products sporting version number badges that could confuse consumers. “We don’t plan to get into a 4.0, 4.1, 4.2 kind of iterative path,” he explained. “We want to keep it as simple as possible. When and if it goes faster, we’ll simply have the faster version of the certification and the brand.” Is Thunderbolt 3 compatibility optional? The specification mentioned that using Thunderbolt 3 support is optional. The published spec states: It’s up to USB4 device makers to support Thunderbolt.  This was a major topic of discussion among people on Social media. https://twitter.com/KevinLozandier/status/1169106844289077248 The USB Implementers Forum released a detailed statement clarifying the issue. "Regarding USB4 specification’s optional support for Thunderbolt 3, USB-IF anticipates PC vendors to broadly support Thunderbolt 3 compatibility in their USB4 solutions given Thunderbolt 3 compatibility is now included in the USB4 specification and therefore royalty free for formal adopters," the USB-IF said in a statement. "That said, Intel still maintains the Thunderbolt 3 branding/certification so consumers can look for the appropriate Thunderbolt 3 logo and brand name to ensure the USB4 product in question has the expected Thunderbolt 3 compatibility. Furthermore, the decision was made not to make Thunderbolt 3 compatibility a USB4 specification requirement as certain manufacturers (e.g. smartphone makers) likely won’t need to add the extra capabilities that come with Thunderbolt 3 compatibility when designing their USB4 products," the statement added. Though the specification is released, it will be some time before USB4 compatible devices hit the market. We can expect to see devices that take advantage of the new version late 2020 or beyond. Read more in Hardware USB-IF launches ‘Type-C Authentication Program’ for better security Apple USB Restricted Mode: Here’s Everything You Need to Know USB 4 will integrate Thunderbolt 3 to increase the speed to 40Gbps
Read more
  • 0
  • 0
  • 4491

article-image-implementing-memory-management-with-golang-garbage-collector
Packt Editorial Staff
03 Sep 2019
10 min read
Save for later

Implementing memory management with Golang's garbage collector

Packt Editorial Staff
03 Sep 2019
10 min read
Did you ever think of how bulk messages are pushed in real-time that fast? How is it possible? Low latency garbage collector (GC) plays an important role in this. In this article, we present ways to look at certain parameters to implement memory management with the Golang GC. Garbage collection is the process of freeing up memory space that is not being used. In other words, the GC sees which objects are out of scope and cannot be referenced anymore and frees the memory space they consume. This process happens in a concurrent way while a Go program is running and not before or after the execution of the program. This article is an excerpt from the book Mastering Go - Third Edition by Mihalis Tsoukalos. Mihalis runs through the nuances of Go, with deep guides to types and structures, packages, concurrency, network programming, compiler design, optimization, and more.  Implementing the Golang GC The Go standard library offers functions that allow you to study the operation of the GC and learn more about what the GC does secretly. These functions are illustrated in the gColl.go utility. The source code of gColl.go is presented here in chunks. Package main import (    "fmt"    "runtime"    "time" ) You need the runtime package because it allows you to obtain information about the Go runtime system, which, among other things, includes the operation of the GC. func printStats(mem runtime.MemStats) { runtime.ReadMemStats(&mem) fmt.Println("mem.Alloc:", mem.Alloc) fmt.Println("mem.TotalAlloc:", mem.TotalAlloc) fmt.Println("mem.HeapAlloc:", mem.HeapAlloc) fmt.Println("mem.NumGC:", mem.NumGC, "\n") } The purpose of the printStats() function is to avoid writing the same Go code all the time. The runtime.ReadMemStats() call gets the latest garbage collection statistics for you. func main() {    var mem runtime.MemStats    printStats(mem)    for i := 0; i < 10; i++ { // Allocating 50,000,000 bytes        s := make([]byte, 50000000)        if s == nil {            fmt.Println("Operation failed!")          }    }    printStats(mem) In this part, we have a for loop that creates 10-byte slices with 50,000,000 bytes each. The reason for this is that by allocating large amounts of memory, we can trigger the GC. for i := 0; i < 10; i++ { // Allocating 100,000,000 bytes      s := make([]byte, 100000000)       if s == nil {           fmt.Println("Operation failed!")       }       time.Sleep(5 * time.Second)   } printStats(mem) } The last part of the program makes even bigger memory allocations – this time, each byte slice has 100,000,000 bytes. Running gColl.go on a macOS Big Sur machine with 24 GB of RAM produces the following kind of output: $ go run gColl.go mem.Alloc: 124616 mem.TotalAlloc: 124616 mem.HeapAlloc: 124616 mem.NumGC: 0 mem.Alloc: 50124368 mem.TotalAlloc: 500175120 mem.HeapAlloc: 50124368 mem.NumGC: 9 mem.Alloc: 122536 mem.TotalAlloc: 1500257968 mem.HeapAlloc: 122536 mem.NumGC: 19 The value of mem.Alloc is the bytes of allocated heap objects — allocated are all the objects that the GC has not yet freed. mem.TotalAlloc shows the cumulative bytes allocated for heap objects—this number does not decrease when objects are freed, which means that it keeps increasing. Therefore, it shows the total number of bytes allocated for heap objects during program execution. mem.HeapAlloc is the same as mem.Alloc. Last, mem.NumGC shows the total number of completed garbage collection cycles. The bigger that value is, the more you have to consider how you allocate memory in your code and if there is a way to optimize that. If you want even more verbose output regarding the operation of the GC, you can combine go run gColl.go with GODEBUG=gctrace=1. Apart from the regular program output, you get some extra metrics—this is illustrated in the following output: $ GODEBUG=gctrace=1 go run gColl.go gc 1 @0.021s 0%: 0.020+0.32+0.015 ms clock, 0.16+0.17/0.33/0.22+0.12 ms cpu, 4->4->0 MB, 5 MB goal, 8 P gc 2 @0.041s 0%: 0.074+0.32+0.003 ms clock, 0.59+0.087/0.37/0.45+0.030 ms cpu, 4->4->0 MB, 5 MB goal, 8 P . . . gc 18 @40.152s 0%: 0.065+0.14+0.013 ms clock, 0.52+0/0.12/0.042+0.10 ms cpu, 95->95->0 MB, 96 MB goal, 8 P gc 19 @45.160s 0%: 0.028+0.12+0.003 ms clock, 0.22+0/0.13/0.081+0.028 ms cpu, 95->95->0 MB, 96 MB goal, 8 P mem.Alloc: 120672 mem.TotalAlloc: 1500256376 mem.HeapAlloc: 120672 mem.NumGC: 19 Now, let us explain the 95->95->0 MB triplet in the previous line of output. The first value (95) is the heap size when the GC is about to run. The second value (95) is the heap size when the GC ends its operation. The last value is the size of the live heap (0). Go garbage collection is based on the tricolor algorithm The operation of the Go GC is based on the tricolor algorithm, which is the subject of this subsection. Note that the tricolor algorithm is not unique to Go and can be used in other programming languages as well. Strictly speaking, the official name for the algorithm used in Go is the tricolor mark-and-sweep algorithm. It can work concurrently with the program and uses a write barrier. This means that when a Go program runs, the Go scheduler is responsible for the scheduling of the application and the GC. This is as if the Go scheduler has to deal with a regular application with multiple goroutines! The core idea behind this algorithm came from Edsger W. Dijkstra, Leslie Lamport, A. J. Martin, C. S. Scholten, and E. F. M. Steffens and was first illustrated in a paper named On-the-Fly Garbage Collection: An Exercise in Cooperation. The primary principle behind the tricolor mark-and-sweep algorithm is that it divides the objects of the heap into three different sets according to their color, which is assigned by the algorithm. It is now time to talk about the meaning of each color set. The objects of the black set are guaranteed to have no pointers to any object of the white set. However, an object of the white set can have a pointer to an object of the black set because this has no effect on the operation of the GC. The objects of the gray set might have pointers to some objects of the white set. Finally, the objects of the white set are the candidates for garbage collection. So, when the garbage collection begins, all objects are white, and the GC visits all the root objects and colors them gray. The roots are the objects that can be directly accessed by the application, which includes global variables and other things on the stack. These objects mostly depend on the Go code of a program. After that, the GC picks a gray object, makes it black, and starts looking at whether that object has pointers to other objects of the white set or not. Therefore, when an object of the gray set is scanned for pointers to other objects, it is colored black. If that scan discovers that this particular object has one or more pointers to a white object, it puts that white object in the gray set. This process keeps going for as long as objects exist in the gray set. After that, the objects in the white set are unreachable and their memory space can be reused. Therefore, at this point, the elements of the white set are said to be garbage collected. Please note that no object can go directly from the black set to the white set, which allows the algorithm to operate and be able to clear the objects on the white set. As mentioned before, no object of the black set can directly point to an object of the white set. Additionally, if an object of the gray set becomes unreachable at some point in a garbage collection cycle, it will not be collected at this garbage collection cycle but in the next one! Although this is not an optimal situation, it is not that bad. During this process, the running application is called the mutator. The mutator runs a small function named write barrier that is executed each time a pointer in the heap is modified. If the pointer of an object in the heap is modified, which means that this object is now reachable, the write barrier colors it gray and puts it in the gray set. The mutator is responsible for the invariant that no element of the black set has a pointer to an element of the white set. This is accomplished with the help of the write barrier function. Failing to accomplish this invariant will ruin the garbage collection process and will most likely crash your program in a pretty bad and undesirable way! So, there are three different colors: black, white, and gray. When the algorithm begins, all objects are colored white. As the algorithm keeps going, white objects are moved into one of the other two sets. The objects that are left in the white set are the ones that are going to be cleared at some point. The next figure displays the three color sets with objects in them. Figure 1: The Go GC represents the heap of a program as a graph In the presented graph, you can see that while object E, which is in the white set, can access object F, it cannot be accessed by any other object because no other object points to object E, which makes it a perfect candidate for garbage collection! Additionally, objects A, B, and C are root objects and are always reachable; therefore, they cannot be garbage collected. Graph comprehended Can you guess what will happen next in that graph? Well, it is not that difficult to realize that the algorithm will have to process the remaining elements of the gray set, which means that both objects A and F will go to the black set. Object A will go to the black set because it is a root element and F will go to the black set because it does not point to any other object while it is in the gray set. After object A is garbage collected, object F will become unreachable and will be garbage collected in the next cycle of the GC because an unreachable object cannot magically become reachable in the next iteration of the garbage collection cycle. Note: The Go garbage collection can also be applied to variables such as channels. When the GC finds out that a channel is unreachable, that is when the channel variable cannot be accessed anymore, it will free its resources even if the channel has not been closed. Go allows you to manually initiate a garbage collection by putting a runtime.GC() statement in your Go code. However, have in mind that runtime.GC() will block the caller and it might block the entire program, especially if you are running a very busy Go program with many objects. This mainly happens because you cannot perform garbage collections while everything else is rapidly changing, as this will not give the GC the opportunity to clearly identify the members of the white, black, and gray sets. This garbage collection status is also called garbage collection safe-point. You can find the long and relatively advanced Go code of the GC at https://github.com/golang/go/blob/master/src/runtime/mgc.go, which you can study if you want to learn even more information about the garbage collection operation. You can even make changes to that code if you are brave enough! Understanding Go Internals: defer, panic() and recover() functions [Tutorial] Implementing hashing algorithms in Golang [Tutorial] Is Golang truly community driven and does it really matter?
Read more
  • 0
  • 0
  • 51840

article-image-how-to-ace-a-data-science-interview
Richard Gall
02 Sep 2019
12 min read
Save for later

How to ace a data science interview

Richard Gall
02 Sep 2019
12 min read
So, you want to be a data scientist. It’s a smart move: it’s a job that’s in high demand, can command a healthy salary, and can also be richly rewarding and engaging. But to get the job, you’re going to have to pass a data science interview - something that’s notoriously tough. One of the reasons for this is that data science is a field that is incredibly diverse. I mean that in two different ways: on the one hand it’s a role that demands a variety of different skills (being a good data scientist is about much more than just being good at math). But it's also diverse in the sense that data science will be done differently at every company. That means that every data science interview is going to be different. If you specialize too much in one area, you might well be severely limiting your opportunities. There are plenty of articles out there that pretend to have all the answers to your next data science interview. And while these can be useful, they also treat job interviews like they’re just exams you need to pass. They’re not - you need to have a wide range of knowledge, but you also need to present yourself as a curious and critical thinker, and someone who is very good at communicating. You won’t get a data science by knowing all the answers. But you might get it by asking the right questions and talking in the right way. So, with all that in mind, here are what you need to do to ace your data science interview. Know the basics of data science This is obvious but it’s impossible to overstate. If you don’t know the basics, there’s no way you’ll get the job - indeed, it’s probably better for your sake that you don’t get it! But what are these basics? Basic data science interview questions "What is data science?" This seems straightforward, but proving you’ve done some thinking about what the role actually involves demonstrates that you’re thoughtful and self-aware - a sign of any good employee. "What’s the difference between supervised and unsupervised learning?" Again, this is straightforward, but it will give the interviewer confidence that you understand the basics of machine learning algorithms. "What is the bias and variance tradeoff? What is overfitting and underfitting?" Being able to explain these concepts in a clear and concise manner demonstrates your clarity of thought. It also shows that you have a strong awareness of the challenges of using machine learning and statistical systems. If you’re applying for a job as a data scientist you’ll probably already know the answers to all of these. Just make sure you have a clear answer and that you can explain each in a concise manner. Know your algorithms Knowing your algorithms is a really important part of any data science interview. However, it’s important to not get hung up on the details. Trying to learn everything you know about every algorithm you know isn’t only impossible, it’s also not going to get you the job. What’s important instead is demonstrating that you understand the differences between algorithms, and when to use one over another. Data science interview questions about algorithms you might be asked "When would you use a supervised machine learning algorithm?" "Can you name some supervised machine learning algorithms and the differences between them?" (supervised machine learning algorithms include Support Vector Machines, Naive Bayes, K-nearest Neighbor Algorithm, Regression, Decision Trees) "When would you use an unsupervised machine learning algorithm?" (unsupervised machine learning algorithms include K-Means, autoencoders, Generative Adversarial Networks, and Deep Belief Nets.) Name some unsupervised machine learning algorithms and how they’re different from one another. "What are classification algorithms?" There are others, but try to focus on these as core areas. Remember, it’s also important to always talk about your experience - that’s just as useful, if not even more useful than listing off the differences between different machine learning algorithms. Some of the questions you face in a data science interview might even be about how you use algorithms: "Tell me about the time you used an algorithm. Why did you decide to use it? Were there any other options?" "Tell me about a time you used an algorithm and it didn’t work how you expected it to. What did you do?" When talking about algorithms in a data science interview it’s useful to present them as tools for solving business problems. It can be tempting to talk about them as mathematical concepts, and although it’s good to show off your understanding, showing how algorithms help solve real-world business problems will be a big plus for your interviewer. Be confident talking about data sources and infrastructure challenges One of the biggest challenges for data scientists is dealing with incomplete or poor quality data. If that’s something you’ve faced - or even if it’s something you think you might face in the future - then make sure you talk about that. Data scientists aren’t always responsible for managing a data infrastructure (that will vary from company to company), but even if that isn’t in the job description, it’s likely that you’ll have to work with a data architect to make sure data is available and accurate to be able to carry our data science projects. This means that understanding topics like data streaming, data lakes and data warehouses is very important in a data science interview. Again, remember that it’s important that you don’t get stuck on the details. You don’t need to recite everything you know, but instead talk about your experience or how you might approach problems in different ways. Data science interview questions you might get asked about using different data sources "How do you work with data from different sources?" "How have you tackled dirty or unreliable data in the past?" Data science interview questions you might get asked about infrastructure "Talk me through a data infrastructure challenge you’ve faced in the past" "What’s the difference between a data lake and data warehouse? How would you approach each one differently?" Show that you have a robust understanding of data science tools You can’t get through a data science interview without demonstrating that you have knowledge and experience of data science tools. It’s likely that the job you’re applying for will mention a number of different skill requirements in the job description, so make sure you have a good knowledge of them all. Obviously, the best case scenario is that you know all the tools mentioned in the job description inside out - but this is unlikely. If you don’t know one - or more - make sure you understand what they’re for and how they work. The hiring manager probably won’t expect candidates to know everything, but they will expect them to be ready and willing to learn. If you can talk about a time you learned a new tool that will give the interviewer a lot of confidence that you’re someone that can pick up knowledge and skills quickly. Show you can evaluate different tools and programming languages Another element here is to be able to talk about the advantages and disadvantages of different tools. Why might you use R over Python? Which Python libraries should you use to solve a specific problem? And when should you just use Excel? Sometimes the interviewer might ask for your own personal preferences. Don’t be scared about giving your opinion - as long as you’ve got a considered explanation for why you hold the opinion that you do, you’re fine! Read next: Why is Python so good for AI and Machine Learning? 5 Python Experts Explain Data science interview questions about tools that you might be asked "What tools have you - or could you - use for data processing and cleaning? What are their benefits and disadvantages?" (These include tools such as Hadoop, Pentaho, Flink, Storm, Kafka.) "What tools do you think are best for data visualization and why?" (This includes tools like Tableau, PowerBI, D3.js, Infogram, Chartblocks - there are so many different products in this space that it’s important that you are able to talk about what you value most about data visualization tools.) "Do you prefer using Python or R? Are there times when you’d use one over another?" "Talk me through machine learning libraries. How do they compare to one another?" (This includes tools like TensorFlow, Keras, and PyTorch. If you don’t have any experience with them, make sure you’re aware of the differences, and talk about which you are most curious about learning.) Always focus on business goals and results This sounds obvious, but it’s so easy to forget. This is especially true if you’re a data geek that loves to talk about statistical models and machine learning. To combat this, make sure you’re very clear on how your experience was tied to business goals. Take some time to think about why you were doing what you were doing. What were you trying to find out? What metrics were you trying to drive? Interpersonal and communication skills Another element to this is talking about your interpersonal skills and your ability to work with a range of different stakeholders. Think carefully about how you worked alongside other teams, how you went about capturing requirements and building solutions for them. Think also about how you managed - or would manage - expectations. It’s well known that business leaders can expect data to be a silver bullet when it comes to results, so how do you make sure that people are realistic. Show off your data science portfolio A good way of showing your business acumen as a data scientist is to build a portfolio of work. Portfolios are typically viewed as something for creative professionals, but they’re becoming increasingly popular in the tech industry as competition for roles gets tougher. This post explains everything you need to build a great data science portfolio. Broadly, the most important thing is that it demonstrates how you have added value to an organization. This could be: Insights you’ve shared in reports with management Building customer-facing applications that rely on data Building internal dashboards and applications Bringing a portfolio to an interview can give you a solid foundation on which you can answer questions. But remember - you might be asked questions about your work, so make sure you have an answer prepared! Data science interview questions about business performance "Talk about a time you have worked across different teams." "How do you manage stakeholder expectations?" "What do you think are the most important elements in communicating data insights to management?" If you can talk fluently about how your work impacts business performance and how you worked alongside others in non-technical positions, you will give yourself a good chance of landing the job! Show that you understand ethical and privacy issues in data science This might seem like a superfluous point but given the events of recent years - like the Cambridge Analytica scandal - ethics has become a big topic of conversation. Employers will expect prospective data scientists to have an awareness of some of these problems and how you can go about mitigating them. To some extent, this is an extension of the previous point. Showing you are aware of ethical issues, such as privacy and discrimination, proves that you are fully engaged with the needs and risks a business might face. It also underlines that you are aware of the consequences and potential impact of data science activities on customers - what your work does in the real-world. Read next: Introducing Deon, a tool for data scientists to add an ethics checklist Data science interview questions about ethics and privacy "What are some of the ethical issues around machine learning and artificial intelligence?" "How can you mitigate any of these issues? What steps would you take?" "Has GDPR impacted the way you do data science?"  "What are some other privacy implications for data scientists?" "How do you understand explainability and interpretability in machine learning?" Ethics is a topic that’s easy to overlook but it’s essential for every data scientist. To get a good grasp of the issues it’s worth investigating more technical content on things like machine learning interpretability, as well as following news and commentary around emergent issues in artificial intelligence. Conclusion: Don’t treat a data science interview like an exam Data science is a complex and multi-faceted field. That can make data science interviews feel like a serious test of your knowledge - and it can be tempting to revise like you would for an exam. But, as we’ve seen, that’s foolish. To ace a data science interview you can’t just recite information and facts. You need to talk clearly and confidently about your experience and demonstrate your drive and curiosity. That doesn’t mean you shouldn’t make sure you know the basics. But rather than getting too hung up on definitions and statistical details, it’s a better use of your time to consider how you have performed your roles in the past, and what you might do in the future. A thoughtful, curious data scientist is immensely valuable. Show your interviewer that you are one.
Read more
  • 0
  • 0
  • 4894
Banner background image

article-image-data-science-vs-machine-learning-understanding-the-difference-and-what-it-means-today
Richard Gall
02 Sep 2019
8 min read
Save for later

Data science vs. machine learning: understanding the difference and what it means today

Richard Gall
02 Sep 2019
8 min read
One of the things that I really love about the tech industry is how often different terms - buzzwords especially - can cause confusion. It isn’t hard to see this in the wild. Quora is replete with confused people asking about the difference between a ‘developer’ and an ‘engineer’ and how ‘infrastructure’ is different from ‘architecture'. One of the biggest points of confusion is the difference between data science and machine learning. Both terms refer to different but related domains - given their popularity it isn’t hard to see how some people might be a little perplexed. This might seem like a purely semantic problem, but in the context of people’s careers, as they make decisions about the resources they use and the courses they pay for, the distinction becomes much more important. Indeed, it can be perplexing for developers thinking about their career - with machine learning engineer starting to appear across job boards, it’s not always clear where that role begins and ‘data scientist’ begins. Tl;dr: To put it simply - and if you can’t be bothered to read further - data science is a discipline or job role that’s all about answering business questions through data. Machine learning, meanwhile, is a technique that can be used to analyze or organize data. So, data scientists might well use machine learning to find something out, but it would only be one aspect of their job. But what are the implications of this distinction between machine learning and data science? What can the relationship between the two terms tell us about how technology trends evolve? And how can it help us better understand them both? Read next: 9 data science myths debunked What’s causing confusion about the difference between machine learning and data science? The data science v machine learning confusion comes from the fact that both terms have a significant grip on the collective imagination of the tech and business world. Back in 2012 the Harvard Business Review declared data scientist to be the ‘sexiest job of the 21st century’. This was before the machine learning and artificial intelligence boom, but it’s the point we need to go back to understand how data has shaped the tech industry as we know it today. Data science v machine learning on Google Trends Take a look at this Google trends graph: Both terms broadly received a similar level of interest. ‘Machine learning’ was slightly higher throughout the noughties and a larger gap has emerged more recently. However, despite that, it’s worth looking at the period around 2014 when ‘data science’ managed to eclipse machine learning. Today, that feels remarkable given how machine learning is a term that’s extended out into popular consciousness. It suggests that the HBR article was incredibly timely, identifying the emergence of the field. But more importantly, it’s worth noting that this spike for ‘data science’ comes at the time that both terms surge in popularity. So, although machine learning eventually wins out, ‘data science’ was becoming particularly important at a time when these twin trends were starting to grow. This is interesting, and it’s contrary to what I’d expect. Typically, I’d imagine the more technical term to take precedence over a more conceptual field: a technical trend emerges, for a more abstract concept to gain traction afterwards. But here the concept - the discipline - spikes just at the point before machine learning can properly take off. This suggests that the evolution and growth of machine learning begins with the foundations of data science. This is important. It highlights that the obsession with data science - which might well have seemed somewhat self-indulgent - was, in fact, an integral step for business to properly make sense of what the ‘big data revolution’ (a phrase that sounds eighty years old) meant in practice. Insofar as ‘data science’ is a term that really just refers to a role that’s performed, it’s growth was ultimately evidence of a space being carved out inside modern businesses that gave a domain expert the freedom to explore and invent in the service of business objectives. If that was the baseline, then the continued rise of machine learning feels inevitable. From being contained in computer science departments in academia, and then spreading into business thanks to the emergence of the data scientist job role, we then started to see a whole suite of tools and use cases that were about much more than analytics and insight. Machine learning became a practical tool that had practical applications everywhere. From cybersecurity to mobile applications, from marketing to accounting, machine learning couldn’t be contained within the data science discipline. This wasn’t just a conceptual point - practically speaking, a data scientist simply couldn’t provide support to all the different ways in which business functions wanted to use machine learning. So, the confusion around the relationship between machine learning and data science stems from the fact that the two trends go hand in hand - or at least they used to. To properly understand how they’re different, let’s look at what a data scientist actually does. Read next: Data science for non-techies: How I got started (Part 1) What is data science, exactly? I know you’re not supposed to use Wikipedia as a reference, but the opening sentence in the entry for ‘data science’ is instructive: “Data science is a multi-disciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from structured and unstructured data.” The word that deserves your attention is multi-disciplinary as this underlines what makes data science unique and why it stands outside of the more specific taxonomy of machine learning terms. Essentially, it’s a human activity as much as a technical one - it’s about arranging, organizing, interpreting, and communicating data. To a certain extent it shares a common thread of DNA with statistics. But although Nate Silver said that ‘data scientist’ was “a sexed up term for statistician”, I think there are some important distinctions. To do data science well you need to be deeply engaged with how your work integrates with the wider business strategy and processes. The term ‘statistics’ - like ‘machine learning’ - doesn’t quite do this. Indeed, to a certain extent this has made data science a challenging field to work in. It isn’t hard to find evidence that data scientists are trying to leave their jobs, frustrated with how their roles are being used and how they integrate into existing organisational structures. How do data scientists use machine learning? As a data scientist, your job is to answer questions. These are questions like: What might happen if we change the price of a product in this way? What do our customers think of our products? How often do customers purchase products? How are customers using our products? How can we understand the existing market? How might we tackle it? Where could we improve efficiencies in our processes? That’s just a small set. The types of questions data scientists will be tackling will vary depending on the industry, their company - everything. Every data science job is unique. But whatever questions data scientists are asking, it’s likely that at some point they’ll be using machine learning. Whether it’s to analyze customer sentiment (grouping and sorting) or predicting outcomes, a data scientist will have a number of algorithms up their proverbial sleeves ready to tackle whatever the business throws at them. Machine learning beyond data science The machine learning revolution might have started in data science, but it has rapidly expanded far beyond that strict discipline. Indeed, one of the reasons that some people are confused about the relationship between the two concepts is because machine learning is today touching just about everything, like water spilling out of its neat data science container. Machine learning is for everyone Machine learning is being used in everything from mobile apps to cybersecurity. And although data scientists might sometimes play a part in these domains, we’re also seeing subject specific developers and engineers taking more responsibility for how machine learning is used. One of the reasons for this is, as I mentioned earlier, the fact that a data scientist - or even a couple of them - can’t do all the things that a business might want them to when it comes to machine learning. But another is the fact that machine learning is getting easier. You no longer need to be an expert to employ machine learning algorithms - instead, you need to have the confidence and foundational knowledge to use existing machine learning tools and products. This ‘productization’ of machine learning is arguably what’s having the biggest impact on how we understand the topic. It’s even shrinking data science, making it a more specific role. That might sound like data science is less important today than it was in 2014, but it can only be a good thing for data scientists - it means they are being asked to spread themselves so thinly. So, if you've been googling 'data science v machine learning', you now know the answer. The two terms are distinct but they both come out of the 'big data revolution' which we're still living through. Both trends and terms are likely to evolve in the future, but they're certainly not going to disappear - as the data at our disposal grow, making effective use of it is only going to become more important.
Read more
  • 0
  • 0
  • 5315

article-image-wasmers-first-postgres-extension-to-run-webassembly-is-here
Vincy Davis
30 Aug 2019
2 min read
Save for later

Wasmer's first Postgres extension to run WebAssembly is here!

Vincy Davis
30 Aug 2019
2 min read
Wasmer, the WebAssembly runtime have been successfully embedded in many languages like Rust, Python, Ruby, PHP, and Go. Yesterday, Ivan Enderlin, a PhD Computer Scientist at Wasmer, announced a new Postgres extension version 0.1 for WebAssembly. Since the project is still under heavy development, the extension only supports integers (on 32- and 64-bits) and works on Postgres 10 only. It also does not support strings, records, views or any other Postgres types yet. https://twitter.com/WasmWeekly/status/1167330724787171334 The official post states, “The goal is to gather a community and to design a pragmatic API together, discover the expectations, how developers would use this new technology inside a database engine.” The Postgres extension provides two foreign data wrappers wasm.instances and wasm.exported_functions in the wasm foreign schema. The wasm.instances is a table with an id and wasm_file columns. The wasm.exported_functions is a table with the instance_id, name, inputs, and outputs columns. Enderlin says that these information are enough for the wasm Postgres extension “to generate the SQL function to call the WebAssembly exported functions.” Read Also: Wasmer introduces WebAssembly Interfaces for validating the imports and exports of a Wasm module The Wasmer team ran a basic benchmark of computing the Fibonacci sequence computation to compare the execution time between WebAssembly and PL/pgSQL. The benchmark was run on a MacBook Pro 15" from 2016, 2.9Ghz Core i7 with 16Gb of memory. Image Source: Wasmer The result was that, “Postgres WebAssembly extension is faster to run numeric computations. The WebAssembly approach scales pretty well compared to the PL/pgSQL approach, in this situation.” The Wasmer team believes that though it is too soon to consider WebAssembly as an alternative to PL/pgSQL, the above result makes them hopeful that it can be explored more. To know more about Postgres extension, check out its Github page. 5 barriers to learning and technology training for small software development teams Mozilla CEO Chris Beard to step down by the end of 2019 after five years in the role JavaScript will soon support optional chaining operator as its ECMAScript proposal reaches stage 3
Read more
  • 0
  • 0
  • 3837
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime
article-image-5-barriers-to-learning-and-technology-training-for-small-software-development-teams
Richard Gall
30 Aug 2019
9 min read
Save for later

5 barriers to learning and technology training for small software development teams

Richard Gall
30 Aug 2019
9 min read
Managing, supporting, and facilitating training for software developers isn’t easy. Such is the pace of change that it can be tough to ensure that not only do your developers have the skills they need, but also that they have the resources at their disposal to explore new technologies, try new approaches and solve complex problems. Okay, sure: there are a wealth of free resources out there. But using these is really just learning to walk for many developers. It’s essential, but it can’t be the only thing developers have at their disposal. If it is, their company is doing them a disservice. Even if a company is committed to their team’s development, how to go about it still isn’t straightforward. There are a number of barriers that need to be overcome when it comes to technology training and learning. Here are 5 of them... Barrier 1: Picking resources on the right topics The days of developers specializing are long gone - particularly in a small tech team where everyone has to get stuck in. Full-stack has ensured that today’s software developers need to be confident on everything from databases to UI design, while DevOps means they need to assume responsibility for how their code actually runs in production. The days of it worked on my machine! are over. Read next: DevOps engineering and full-stack development – 2 sides of the same agile coin When you factor in cloud native and hybrid cloud, developers today might well not only be working on cloud platforms with a bewildering array of features and opportunities, but also working on a number of different platforms at once. This means that understanding exactly what your developers need to know can be immensely difficult. It also means that you have to set the technology agenda from the off, immediately limiting your developers ability to solve problems themselves. That’s only going to alienate your developers - you’re effectively telling them that their curiosity, and their desire to explore new topics and alternative solutions is pointless. What can you do? The only way to solve this is to ensure your developers have a rich set of learning resources. Don’t limit them to a specific tool or set of technologies. Allow them to explore new topics as they see fit - don’t force them down a pre-set path. The other benefit of this is that it means you can be flexible and open in terms of the technology you use. Barrier 2: Legacy software and brownfield projects Sometimes, however, you can’t be flexible about the technology you use. Your developers simply need to use existing tools to manage legacy systems and brownfield projects (which build on or demolish existing legacy software). This means that learning needs can become complex. Not only might your development team needs a diverse range of skills for one particular project, they might also need to learn very different skill sets for different projects. Maybe you’re planning a new project built in React.js - great, get your developers on some cutting-edge content. But then, alongside this, what if they also need to tackle legacy software built using PHP? It’s fine if they know PHP, but what if you have a new developer? Or what if they haven’t used it for years? What can you do? As above, a rich set of training and learning resources is vital. But in this instance it’s particularly important to ensure developers have access to learning materials on a wide range of technologies. So yes, cutting-edge topic coverage is essential, but so is content on established and even more residual technologies. Barrier 3: Lack of time for learning new software skills and concepts Lack of time is one of the biggest barriers to learning and training in the tech industry. At least that’s the perception - in reality, a lack of time to learn is caused by resource challenges, and the cultural status of learning inside an organization. If it isn’t a priority, then it’s often the thing that gets pushed back as new day to day activities manage to insert themselves in your team’s day or existing ones stretch out to fill the hours in a way that no one expected. This is risky. Although there are times when things simply need to get done quickly, overlooking learning will not only lead to disengagement in your team, it will also leave them unprepared for challenges that may emerge throughout the course of a project. It’s often said that software development is all about solving problems - but how confident can we be that we’re equipped to solve them if we’re not committed to making time for learning? What can you do? The first step is obvious: make time for learning. One method is to set aside a specific period in the week which your team is encouraged to use for personal development. While that can work well, sometimes trying to structure learning in this way can make it feel like a chore for employees, as if you’re trying to fit them into a predetermined routine. Another option is to simply encourage continuous learning. That might mean a short period every day just to learn a new concept or approach. Or it could be something like a learning diary that allows developers to record things they learn and plan what they want to learn next. Essentially, what’s important is putting your developers in control. Give them the tools to make time - as well as the resources that they can use quickly and easily, and they’ll not only learn new skills much more quickly, they’ll also be more engaged and more curious engineers. Read next: “All of my engineering teams have a machine learning feature on their roadmap” – Will Ballard talks artificial intelligence in 2019 [Interview] Barrier 4: Different learning preferences within the team In even a small engineering team, every person will have unique needs and preferences when it comes to learning. So, even if all your team wants to learn the same topics, they still might disagree about how they want to do it. Some developers, for example, can’t stand learning with video. It forces you to learn at its pace, and - shock horror - you have to listen to someone. Others, however, love it - it’s visual, immediate and, you know, maybe having someone with a voice explaining how things work isn’t actually that bad? Similarly, some people love training courses - the idea of sitting with someone, rather than going it alone - while others value their independence and agency. This means that keeping everyone happy can be tough. It might be tempting to go one of two ways - either decide how you want people to learn and let them get on with it, or let everyone go it alone, but neither approach is ideal. Forcing people to learn a certain way will penalise those who don’t like your preferred learning method, hurting not only their ability to learn new skills but also their trust with you. Letting everyone be independent, meanwhile, means you never have any oversight on what people are doing or using - there’s no level playing field. What can you do? The key to getting over this barrier is balance. You want to preserve your team's independence and sense of agency while also having some transparency over what people are using. Resources that offer a mix of formats are great for this. That way, no one is forced to watch hours and hours of video courses or to trawl through text that will just send them to sleep. Equally, another important thing to think about is how the learning resources you provide your team complement other learning activities that they may be doing independently (things like looking for answers to questions in blog posts or on YouTube). By doing that you can allow your developers some level of independence while also ensuring that you have a set of organizational resources and tools that are available and accessible to everyone. Read next: Why do IT teams need to transition from DevOps to DevSecOps? Barrier 5: The cost of technology training and learning resources Cost is perhaps the biggest barrier to training for development teams. Specific courses and events can be astronomical - especially if more than one person needs to attend - and even some learning platforms can cost significant amounts of money. Now, that’s not always a problem for engineering teams operating in a more corporate or enterprise environment. But for smaller teams in small and medium sized businesses, training can become quite an overhead. In turn, this can have other consequences - from a complete disregard and deprioritization of training and learning to internal frustration at who is getting support and investment and who isn’t, it’s not uncommon to see training become a catalyst for many other organizational and cultural problems. What can you do? The trick here is to not invest heavily in one single thing. Don’t blow your budget on a single course - what if it’s not up to scratch? Don’t commit to an expensive learning platform. However good the marketing looks, if your team doesn't like it they’re certainly not going to use it. Fortunately, there are affordable solutions out there that can ensure you’re not breaking the bank when it comes to training. It might even leave you with some cash left over that you can invest on other resources and materials. Learning new skills isn’t easy. It requires patience and commitment. But developers need support and resources to take some of the strain out of their learning challenges. There’s no one way to meet the learning needs of developers. But Packt for Teams can help you overcome many of the barriers of developer training. Learn more here.
Read more
  • 0
  • 0
  • 4730

article-image-the-accelerate-state-of-devops-2019-report-key-findings-scaling-strategies-and-proposed-performance-productivity-models
Vincy Davis
28 Aug 2019
8 min read
Save for later

The Accelerate State of DevOps 2019 Report: Key findings, scaling strategies and proposed performance &amp; productivity models

Vincy Davis
28 Aug 2019
8 min read
The State of DevOps report is a widely referenced body of DevOps research which helps organizations achieve high organizational performance and productivity. The DORA (DevOps Research and Assessment) team have published the 2019 Accelerate State of DevOps Report which is an independent review of the practices and capabilities of DevOps. Teams across the globe can take advantage of the survey findings and identify specific capabilities to improve their software delivery performance. Key findings in DevOps 2019 report Increase in Elite performers The DevOps 2019 report uses a cluster analysis method to identify the software delivery performance of the participants involved in the survey. All the respondents are categorized into four distinct groups called as Elite, High, Medium and Low Performers. The report states, “In this approach, those in one group are statistically similar to each other and dissimilar from those in other groups, based on our performance behaviors of throughput and stability: deployment frequency, lead time, time to restore service, and change fail rate.” According to the survey, the proportion of elite performers have jumped to 20% from the 7% last year. The percentage of medium performers have also increased, leading to a drop in the percentage of low performers. Hence, the cluster analysis method depicts a continued shift in the industry, as organizations continue to transform their technology. Image Source: 2019 Accelerate State of DevOps Report Read Also: Does it make sense to talk about DevOps engineers or DevOps tools? Strategies for scaling DevOps in organizations The Accelerate State of DevOps 2019 report have framed out several strategies for scaling DevOps in an organization. The strategies are created based on commonly used approaches observed by the DORA team across the industry. Training Center (DOJO): Employees are taken out of their usual work routines to learn new tools or technologies. Next, they are required to implement the new learnt methods in their work and also inspire others to do so. Center of Excellence: This strategy uses all the available skills for consultation purpose. Proof of Concept but Stall: A central team is given the freedom to build in any possible way (often by breaking organizational norms). However, the effort stalls after the PoC. Proof of Concept as a Template: It begins with the PoC but Stall project which is then replicated in other groups using the same pattern. Proof of Concept as a Seed: This approach ensures that the PoC members stay active in the new groups indefinitely or just as long to ensure that the new practices are sustainable. Communities of Practice: Groups sharing the same common interests in tooling, language, or methodologies are encouraged within an organization to share knowledge and expertise with each other and across teams. Big Bang: When an entire organization is transformed as per DevOps methodologies at once. Bottom-up or Grassroots: Small teams are put together to transform resources and share their success throughout the organization in an informal way. Mashup: When an organization implements several of the above described approaches with partial execution or with insufficient resources. The distribution of DevOps transformation strategies as per performance profiles are shown below. Image Source: 2019 Accelerate State of DevOps Report Cloud computing usage is the driving force behind elite performers The DORA team uses the five essential characteristics of cloud computing, as defined by the National Institute of Standards and Technology (NIST) to understand cloud usage patterns by the categorized performers. The essential characteristics of cloud computing are given below. On-demand self-service: DevOps users can use the cloud on an on-demand self-service premise to use the computing resources whenever needed, without any human intervention. This is the least used characteristic by users with only 57% of the survey respondents agreeing  on using it. Broad network access: Such networks can be used through heterogeneous platforms such as mobile phones, tablets, laptops, and workstations. 60% of the survey respondents agreed on using the broad network access. Resource pooling: The provider resources in a cloud are pooled in a multi-tenant model such that the physical and virtual resources are dynamically assigned on-demand. This allows the  customer to specify a location at a higher level of abstraction such as country, state, or datacenter. Among all the survey respondents, 58% of them agreed on using the resource pooling characteristics. Rapid elasticity: The cloud capabilities are elastically provisioned to the extent that it can be released rapidly to scale outward or inward on demand. Hence, the cloud capabilities seem to be unlimited and can be appropriated at any point of time in any quantity. When compared to 2018, this characteristic saw a growth of +15% this year, with 58% of the survey respondents utilizing it. Measured service: This is the most used characteristic of cloud computing as 62% of the respondents agreed on using the cloud measured service. It allows the cloud systems to be automatically used to control, optimize, and report resource usage based on the type of services like storage, processing, bandwidth, and active user accounts. The DevOps 2019 survey found that only 29% of all the respondents agreed on utilizing all the essential cloud computing characteristics. The survey also revealed that the elite performers in the survey used the cloud characteristics 24% times more than the low performers. Importance of psychological safety to SDO performance The DevOps survey found that a culture of psychological safety where team members feel safe to take risks and be vulnerable in front of each other are able to better deliver software delivery performance, organizational performance, and high productivity. This enables an organization to come up with new products and features without impacting the existing users, hence leading to desirable levels of software delivery and operational performance (SDO performance). The report concludes that due to psychological safety elite performers can twice achieve or exceed their organizational performance goals. Size of an industry does not correspond to its success This year, the retail industry saw significantly better SDO performance, in terms of speed and stability of software delivery. However, the DevOps report states that, “We found no evidence that industry has an impact with the exception of retail, suggesting that organizations of all types and sizes, including highly regulated industries such as financial services and government, can achieve high levels of performance.” The report suggests that size is not a factor for poor performance of other industries and high levels of performance can be achieved by adopting DevOps practices. Read Also: Listen: Puppet’s VP of Ecosystem Engineering Nigel Kersten talks about key DevOps challenges [Podcast] Proposed steps to improve performance and productivity in DevOps The Accelerate State of DevOps Report of 2019 proposes two research models to improve DevOps performance and productivity. Performance model The DevOps report states, “A key goal in digital transformation is optimizing software delivery performance.” To improve software delivery and operational (SDO) performance and organizational performance, teams can first start with basic automation such as version control and automated testing, monitoring, clear change approval processes, and a healthy culture. The DevOps 2019 survey finds that low performers use more proprietary software than high and elite performers. Image Source: 2019 Accelerate State of DevOps Report Productivity model The report defines productivity as “the ability to get complex, time-consuming tasks completed with minimal distractions and interruptions.” Teams can use the useful and easy-to-use tools along with internal and external search to accelerate productivity. The DevOps survey reveals that the highest performing engineers are 1.5 times more likely to use the easy-to-use tools. An improved amount of productivity also helps employees have a better work/life balance and less burnout. Image Source: 2019 Accelerate State of DevOps Report DevOps teams can use the above two models to locate their goal, identify the dependent factors and increase their overall performance and productivity output. Demographic makeup of the DevOps 2019 survey This year, the DORA survey had almost 1,000² participants. Among all the responses, 26% responses came from employees working in very large companies (10,000+). Compared to last year, there has been a drop in response from employees working in companies with 500-1,999 employees. In contrast, more responses have been received from people working in company sizes of 100-499 employees. The majority of participants worked in the Development or Engineering department. The DevOps or Site Reliability Engineering (SRE) department came next, followed by Manager, IT Operations or Infrastructure, and others. Half of the participants in the 2019 research are from North America, followed by EU/ UK at 29%. This year also saw a fall in responses from Asia, which is only 9% when compared to the 18% last year. The percentage of women on teams have also reduced to 16% (median) from the 25% reported last year. Image Source: 2019 Accelerate State of DevOps Report Developers across the world love the Accelerate State of DevOps 2019 report and are thanking the DORA team for major takeaways like cloud being a key differentiator, how to be a elite performer in software delivery performance, and more. https://twitter.com/kniklas/status/1165681512538398720 https://twitter.com/nickj69/status/1164707063907225600 https://twitter.com/DawieO/status/1164703456675819522 https://twitter.com/kylekyle/status/1164692941559877632 https://twitter.com/tottiLFC/status/1164800298885402624 https://twitter.com/mirko_novakovic/status/1164606178615341060 Here’s a two minute summary video of the Accelerate State of DevOps 2019 report, published by Google Cloud. https://www.youtube.com/watch?v=8M3WibXvC84 Interested readers can check out the report to see a detailed comparison of tool usage, according to low, medium, high, and elite profiles. Also, you can read the full 2019 Accelerate State of DevOps Report for more information. 5 reasons poor communication can sink DevSecOps 7 crucial DevOps metrics that you need to track Introducing kdevops, a modern DevOps framework for Linux kernel development
Read more
  • 0
  • 0
  • 4908

article-image-5-reasons-poor-communication-can-sink-devsecops
Guest Contributor
27 Aug 2019
7 min read
Save for later

5 reasons poor communication can sink DevSecOps

Guest Contributor
27 Aug 2019
7 min read
In the last few years, a major shift has occurred within the software industry in terms of how companies organize themselves and manage product responsibilities. The result is a merging of development and operations roles under a single umbrella known as DevOps. However, that’s not where the story ends. More and more businesses are beginning to realize that cybersecurity strategy should not be treated as an independent activity. Software reliability is completely dependent upon security and risk management; otherwise, you leave your organization vulnerable to external attacks. The result of this thinking has been an increase in the scope of the DevOps role, adding a security aspect as well also known as DevSecOps. But not every company can easily shift to the DevSecOps model overnight, especially when communication issues get in the way. In this article, we'll cover five of the most common roadblocks and ways to avoid them. 1. Unclear responsibilities Now that the DevSecOps trend has gone mainstream in the software industry, you'll see many new job listings popping up in this area. Unfortunately, some companies are using the term as a catch-all to throw various disconnected duties at a single person. The result can be quite frustrating. Leadership and management need to set a clear scope for the responsibilities of DevSecOps engineers and integrate them directly with other parts of the organization, including development, quality assurance, and cloud administrators. DevSecOps can only work as a strategy if it's fully adopted and understood by people at every level. It's also important to be careful not to characterize the DevSecOps concept as a group of tools. The focus should always remain on the individuals performing their duties and not the applications that they use. Clarifying this line of distinction can make it easier to communicate within your organization. 2. Missing connection to end-users Engineers in the DevSecOps role should be involved at every phase of the software development lifecycle. Otherwise, they will not have a holistic view of the platform's security status and how it is functioning. In a worst-case scenario, the DevSecOps team is disconnected from end-users entirely. If an engineer does not understand how real people are using their application, then the product as a whole is likely doomed. User requirements should form the basis of every coding project, and supporting the development lifecycle is only possible if that link exists and is maintained. 3. Too many (and unsecured) communication tools Engineers in a DevSecOps role often spend the majority of their days coordinating between other groups within the organization. This activity can't succeed unless there is a strong communication tool set at their disposal. One mistake many companies make is deciding to invest in dozens of different chat, messaging, and conferencing apps in hopes that it will make things easier. The problem is that easy online communication comes at the price of privacy. Some platforms retain your data for their own internal use or even to sell - in the form of uniquely identifiable IP addresses - to advertisers. Though it’s relatively easy to hide your IP address, do you want to trust an app that plays fast and loose with your information in the first place? The problem is that all this ease of communication comes at a price which often includes data retention or sharing with third parties of uniquely identifiable information like IP addresses, which can be hidden a few different ways - but why trust an app that reveals it in the first place? One way a DevSecOps team can address this issue is to emphasize the security risk to decision-makers in regards to many popular tools. Slack, WhatsApp, Snapchat and others are recent examples of a popular messaging apps that are now taking flak because of different security risks they pose. When it comes to email, even Gmail, the most popular email service, has been caught allowing unfettered access to user email addresses. Our advice is to use an encrypted email tool such as ProtonMail or Mailfence rather than rely on the usual suspects with with better name recognition. The more communication tools you use, the larger the threat surface vulnerable to hackers. 4. Alert fatigue One key part of the DevSecOps suite of responsibilities is to streamline all monitoring and alerting functions across the organization. The goal is to make it easy for both managers and engineers to find out about live issues and quickly navigate to the source of the problem. In some cases, DevSecOps engineers will be asked to set very sensitive alerting protocols so that no potential problem is missed. There is a downside, though, because having too many notifications can lead to alert fatigue. This is especially true if your monitoring tools are throwing false positives all day long. A string of unnecessary alerts will cause people to stop paying attention to them. An approach to alerting should be well thought out and clearly documented in runbooks by the DevSecOps team. A runbook should explain exactly what an alert means and the steps required to address it. This level of documentation allows DevSecOps engineers to outsource incident response to a larger group. 5. Hidden dependencies Because of the wide scope of the DevSecOps role, sometimes organizations expect engineers to be fortune-tellers and be able to predict how changes will impact code, tests, security. This level of confidence cannot be reached unless there is clear and consistent communication across the company. Take for example a decision to add a firewall protection around a database server to block outside threats. This will probably seem like a simple change for the engineers working on the system, but they may not realize that a new firewall could cut off connections to other services within the same infrastructure. If DevSecOps was involved in the meetings and decision making, then this type of hidden dependency could have been uncovered earlier. The DevSecOps model can only succeed if the organization has a strong policy of change management. Any modification to a live system should be thoroughly vetted by representatives of all teams. At that time, risks can be weighed and approvals can be made. Changes should be scheduled at times when the impact will be minimal. Final thoughts When browsing job listings, you'll surely see an influx of roles mentioning both DevOps and DevSecOps. These roles can have an incredibly wide scope and often play a critical role in the success of a software company. But breakdowns in communication have the potential to derail the goals of DevSecOps putting the entire organization at risk. Modern software development is all about being agile, meaning that requirements gathering and coding all happen with great flexibility and fluidity. The same should be true for DevSecOps. The duties of this role should be evaluated and tweaked on a regular basis and clear communication is the best way to go about it. Author Bio Gary Stevens is a front-end developer. He’s a full-time blockchain geek and a volunteer working for the Ethereum foundation as well as an active Github contributor. Why do IT teams need to transition from DevOps to DevSecOps? The seven deadly sins of web design Dark Web Phishing Kits: Cheap, plentiful and ready to trick you
Read more
  • 0
  • 0
  • 3242

article-image-rust-is-the-future-of-systems-programming-c-is-the-new-assembly-intel-principal-engineer-josh-triplett
Bhagyashree R
27 Aug 2019
10 min read
Save for later

“Rust is the future of systems programming, C is the new Assembly”: Intel principal engineer, Josh Triplett

Bhagyashree R
27 Aug 2019
10 min read
At Open Source Technology Summit (OSTS) 2019, Josh Triplett, a Principal Engineer at Intel gave an insight into what Intel is contributing to bring the most loved language, Rust to full parity with C. In his talk titled Intel and Rust: the Future of Systems Programming, he also spoke about the history of systems programming, how C became the “default” systems programming language, what features of Rust gives it an edge over C, and much more. Until now, OSTS was Intel's closed event where the company's business and tech leaders come together to discuss the various trends, technologies, and innovations that will help shape the open-source ecosystem. However, this year was different as the company welcomed non-Intel attendees including media, partners, and developers for the first time. The event hosts keynotes, more than 50 technical sessions, panels, demos covering all the open source technologies Intel is involved in. These include integrated software stacks (edge, AI, infrastructure), firmware, embedded and IoT projects, and cloud system software. This year the event happened from May 14-16 at Stevenson, Washington. What is systems programming Systems programming is the development and management of software that serves as a platform for other software to be built upon. The system software also directly or closely interfaces with computer hardware in order to gain necessary performance and expose abstractions. Unlike application programming where software is created to provide services to the user, it aims to produce software that provides services to the computer hardware. Triplett broadly defines systems programming as “anything that isn't an app.” It includes things like BIOS, firmware, boot loaders, operating systems kernels, embedded and similar types of low-level code, virtual machine implementations. Triplett also counts a web browser as a system software as it is more than “just an app,” they are actually “platforms for websites and web apps,” he says. How C became the “default” systems programming language Previously, most system software including BIOS, boot loaders, and firmware were written in Assembly. In the 1960s, experiments to bring hardware support in high-level languages started, which resulted in the creation of languages such as PL/S, BLISS, BCPL, and extended ALGOL. Then in the 1970s, Dennis Ritchie created the C programming language for the Unix operating system. Derived from the typeless B programming language, C was packed with powerful high-level functionalities and detailed features that were best suited for writing an operating system. Several UNIX components including its kernel were eventually rewritten in C. Many other system software including the Oracle database, a large portion of Windows source code, Linux operating system, were all written in C. C was seeing a huge adoption at this point. But, what exactly made developers comfortable moving to C? Triplett believes that in order to make this move from one language to another, developers have to be comfortable in terms of two things: features and parity. First, the language should offer “sufficiently compelling” features. “It can’t just be a little bit better. It has to be substantially better to warrant the effort and engineering time needed to move,” he adds. As compared to Assembly, C had a lot to offer. It had some degree of type safety, provided portability, better productivity with high-level constructs, and much more readable code. Second, the language has to provide parity, which means developers had to be confident that it is no less capable than Assembly. He states, “It can’t just be better, it also has to be no worse.” In addition to being faster and expressing any type of data that Assembly was able to, it also had what Triplett calls “escape hatch.”  This means you are allowed to make the move incrementally and also combine Assembly if required. Triplett believes that C is now becoming what Assembly was years ago. “C is the new Assembly,” he concludes. Developers are looking for a high-level language that not only addresses the problems in C that can’t be fixed but also leverage other exciting features that these languages provide. Such a language that aims to be compelling enough to make developers move from C should be memory safe, provide automatic memory management,  security, and much more. “Any language that wants to be better than C has to offer a lot more than just protection from buffer overflows if it's actually going to be a compelling alternative. People care about usability and productivity. They care about writing code that is self-explanatory, which accomplishes more work in less code. It also needs to address security issues. Usability and productivity go hand in hand with security. The less code you need to write to accomplish something, the less chance you have of introducing bugs security bugs or otherwise,” he explains. Comparing Rust with C Back in 2006, Graydon Hoare, a Mozilla employee started writing Rust as a personal project. Mozilla, in 2009, started sponsoring the project and also expanded the team to drive further development of the language. One of the reasons why Mozilla got interested is that Firefox was written in more than 4 million lines of C++ code and had quite a bit of highly critical vulnerabilities. Rust was built with safety and concurrency in mind making it the perfect choice for rewriting many components of Firefox under Project Quantum. It is also using Rust to develop Servo, an HTML rendering engine that will eventually replace Firefox’s rendering engine. Many other companies have also started using Rust for their projects including Microsoft, Google, Facebook, Amazon, Dropbox, Fastly, Chef, Baidu, and more. Rust addresses the memory management problem in C. It offers automatic memory management so that developers do not have to manually call free on every object. What sets it apart from other modern languages is that it does not have a garbage collector or runtime system of any kind. Rust instead has the concepts of ownership, borrowing, references, and lifetimes. “Rust has a system of declaring whether any given use of an object is the owner of that object or whether it's just borrowing that object temporarily. If you're just borrowing an object the compiler will keep track of that. It'll make sure that the original sticks around as long as you reference it. Rust makes sure that the owner of the object frees it when it's done and it inserts the call to free at compile time with no extra runtime overhead,” Triplett explains. Not having a runtime is also a plus for Rust. Triplett believes that languages that have a runtime are difficult to use as a system programming language. He adds, “You have to initialize that runtime before you can call any code, you have to use that runtime to call functions, and the runtime itself might run extra code behind your back at unexpected times.” Rust also aims to provide safe concurrent programming. The same features that make it memory safe, keep track of things like which thread own which object, which objects can be passed between threads, and which objects require acquiring locks. These features make Rust compelling enough for developers to choose for systems programming. However, talking about the second criteria, Rust does not have enough parity with C yet. “Achieving parity with C is exactly what got me involved in Rust,” says Triplett Teaching Rust about C compatible unions Triplett's first contribution to the Rust programming language was in the form of the 1444 RFC, which was started in 2015 and got accepted in 2016. This RFC proposed to bring native support for C-compatible unions in Rust that would be defined via a new "contextual keyword" union. Triplett understood the need for this proposal when he wanted to build a virtual machine in Rust and the Linux kernel interface for that /dev/kvm required unions. "I worked with the Rust community and with the language team to get unions into Rust and because of that work I'm actually now part of the Rust language governance team helping to evaluate and guide other changes into the language," he adds. He talked about this RFC in much detail at the very first RustConf in 2016: https://www.youtube.com/watch?v=U8Gl3RTXf88 Support for unnamed struct and union types Another feature that Triplett worked on was the support for unnamed struct and union types in Rust. This has been a widespread C compiler extension for decades and was also included in the C11 standard. This allowed developers to group and layout fields in arbitrary ways to match C data structures used in the Foreign Function Interface (FFI). With this proposal implemented, Rust will be able to represent such types using the same names as the structures without interposing artificial field names that will confuse users of well-established interfaces from existing platforms. A stabilized support for inline Assembly in Rust Systems programming often involves low-level manipulations and requires low-level details of the processors such as privileged instructions. For this, Rust supports using inline Assembly via the ‘asm!’ macro. However, it is only present in the nightly compiler and not yet stabilized. Triplett in a collaboration with other Rust developers is writing a proposal to introduce more robust syntax for inline Assembly. To know more in detail about support for inline Assembly, check out this pre-RFC. BFLOAT16 support into Rust Many Intel processors including Xeon Scalable ‘Cooper Lake-SP’ now support BFLOAT16, a new floating-point format. This truncated 16-bit version of the 32-bit IEEE 754 single-precision floating-point format was mainly designed for deep learning. This format is also used in machine learning libraries like Tensorflow that work with huge datasets. It also makes interoperating with existing systems, functions, and storage much easier. This is why Triplett is working on adding support for BFLOAT16 in Rust so that developers would be able to use the full capabilities of their hardware. FFI/C Parity Working Group This was one of the important announcements that Triplett made. He is starting a working group that will focus on achieving full parity with C. Under this group, he aims to collaborate with both the Rust community and other Intel developers to develop the specifications for the remaining features that need to be implemented in Rust for system programming. This group will also focus on bringing support for systems programming using the stable releases of Rust, not just experimental nightly releases of the compiler. In last week’s Reddit discussion, Triplett shared the current status of the working group, “To pre-answer, one question: the FFI / C Parity working group is in the process of being launched, and hasn't quite kicked off yet. I'll be posting about it here and elsewhere when it is, along with the initial goals.” Watch Josh Triplett’s full OSTS talk to know more about Intel’s contribution to Rust: https://www.youtube.com/watch?v=l9hM0h6IQDo [box type="shadow" align="" class="" width=""]Update: We have made the following corrections based on feedback from Josh Triplett: This year OSTS was open to Intel's partners and press. Previously, the article read 'escape patch', but it is 'escape hatch.' RFC 1444 wasn't last year, it was started in 2015 and accepted in 2016. 'dev KVM' is now corrected to '/dev/kvm'[/box] AMD competes with Intel by launching EPYC Rome, world’s first 7 nm chip for data centers, luring in Twitter and Google Hot Chips 31: IBM Power10, AMD’s AI ambitions, Intel NNP-T, Cerebras largest chip with 1.2 trillion transistors and more Intel’s 10th gen 10nm ‘Ice Lake’ processor offers AI apps, new graphics and best connectivity
Read more
  • 0
  • 0
  • 21733
article-image-react-forces-leaders-to-confront-community-toxic-culture
Sugandha Lahoti
27 Aug 2019
7 min read
Save for later

#Reactgate forces React leaders to confront community's toxic culture head on

Sugandha Lahoti
27 Aug 2019
7 min read
On Thursday last week, Twitter account @heydonworks posted a tweet that “Vue developers like cooking/quiet activities and React developers like trump, guns, weightlifting and being "bros". He also talked about the rising number of super conservative React dev accounts. https://twitter.com/heydonworks/status/1164506235518910464 This was met with disapproval from people within both the React and Vue communities. “Front end development isn’t a competition,” remarked one user. https://twitter.com/mattisadev/status/1164633489305739267 https://twitter.com/nsantos_pessoal/status/1164629726499102720 @heydonworks responded to the chorus of disapproval by saying that his intention was to  highlight how a broad and diverse community of thousands of people can be eclipsed by an aggressive and vocal toxic minority. He then went on to ask Dan Abramov, React co-founder, “Perhaps a public disowning of the neocon / supremacist contingent on your part would land better than my crappy joke?” https://twitter.com/heydonworks/status/1164653560598093824 He also clarified how his original tweet was supposed to paint a picture of what React would be like if it was taken over by hypermasculine conservatives. “I admit it's not obvious”, he tweeted, “but I am on your side. I don't want that to happen and the joke was meant as a warning.” @heydonworks also accused a well known React Developer of playing "the circle game" at a React conference. The “circle game” is a school prank that has more recently come to be associated with white supremacism in the U.S. @heydonworks later deleted this tweet and issued  an apology admitting that he was wrong to accuse the person of making the gesture. https://twitter.com/heydonworks/status/1165439718512824320 This conversation then developed into a wider argument about how toxicity is enabled and allowed in the React community - and, indeed, other tech communities as well. The crucial point that many will have to reckon with is what behaviors people allow and overlook. Indeed, to a certain extent, the ability to be comfortable with certain behaviors is related to an individual’s privilege - what may seem merely an aspect or a quirk of someone’s persona to one person, might be threatening and a cause of discomfort to another person. This was the point made by web developer Nat Alison (@tesseralis): “Remember that fascists and abusers can often seem like normal people to everyone but the people that they're harming.” Alison’s thread highlights that associating with people without challenging toxic behaviors or attitudes is a way of enabling and tacitly supporting them. https://twitter.com/tesseralis/status/1165111494062641152 Web designer Tatiana Mac quits the tech industry following the React controversy Web designer Tatiana Mac’s talk at Clarity Conf (you can see the slides here) in San Francisco last week (21 August) took place just a few hours before @heydonworks sent the first of his tweets mentioned above. The talk was a powerful statement on how systems can be built in ways that can either reinforce power or challenge it. Although it was well-received by many present at the event and online, it also was met with hostility, with one Twitter user (now locked) tweeting in response to an image of Mac’s talk that it “most definitely wasn't a tech conference… Looks to be some kind of SJW (Social justice warrior) conference.” This only added an extra layer of toxicity to the furore that has been engulfing the React community. Following the talk, Mac offered her thoughts, criticizing those she described as being more interested in “protecting the reputation of a framework than listening to multiple marginalized people.” https://twitter.com/TatianaTMac/status/1164912554876891137 She adds, “I don’t perceive this problem in the other JS framework communities as intensively.  Do White Supremacists exist in other frameworks? Likely. But there is a multiplier/feeder here that is systemically baked. That’s what I want analysed by the most ardent supporters of the community.” She says that even after bringing this issue multiple times, she has been consistently ignored. Her tweet reads, “I'm disappointed by repeatedly bringing this shit up and getting ignored/gaslit, then having a white woman bring it up and her getting praised for it? White supremacy might as well be an opiate—some people take it without ever knowing, others microdose it to get ahead.” “Why is no one like, ‘Tatiana had good intentions in bringing up the rampant racism problem in our community?’ Instead, it’s all, ‘Look at all the impact it had on two white guys!’ Is cuz y’all finally realise intent doesn’t erase impact?”, she adds. She has since decided to quit the tech industry following these developments. In a tweet, she wrote that she is “incredibly sad, disappointed, and not at all surprised by *so* many people.” Mac has described in detail the emotional and financial toll the situation is having on her. She has said she is committed to all contracts through to 2020, but also revealed that she may need to sell belongings to support herself. This highlights the potential cost involved in challenging the status quo. To provide clarity on what has happened, Tatiana approached her friend, designer Carlos Eriksson, who put together a timeline of the Reactgate controversy. Dan Abramov and Ken Wheeler quit and then rejoin Twitter Following the furore, both Dan Abramov and Ken Wheeler quit Twitter over the weekend. They have now rejoined. After he deactivated, Abramov talked about his disappearance from Twitter on Reddit: “Hey all. I'm fine, and I plan to be back soon. This isn't a ‘shut a door in your face’ kind of situation.  The real answer is that I've bit off more social media than I can chew. I've been feeling anxious for the past few days and I need a clean break from checking it every ten minutes. Deactivating is a barrier to logging in that I needed. I plan to be back soon.” Abramov returned to Twitter on August 27. He apologized for his sudden disappearance. He apologized, calling deactivating his account “a desperate and petty thing.” He also thanked Tatiana Mac for highlighting issues in the React community. “I am deeply thankful to @TatianaTMac for highlighting issues in the React community,” Abramov wrote. “She engaged in a dialog despite being on the receiving end of abuse and brigading. I admire her bravery and her kindness in doing the emotional labor that should have fallen on us instead.” Wheeler also returned to Twitter. “Moving forward, I will be working to do better. To educate myself. To lift up minoritized folks. And to be a better member of the community. And if you are out there attacking and harassing people, you are not on my side,” he said. Mac acknowledged  Abramov and Wheeler’s apologies, writing that, “it is unfair and preemptive to call Dan and Ken fragile. Both committed to facing the white supremacist capitalist patriarchy head on. I support the promise and will be watching from the sidelines supporting positive influence.” What can the React community do to grow from this experience? This news has shaken the React community to the core. At such distressing times, the React community needs to come together as a whole and offer constructive criticism to tackle the issue of unhealthy tribalism, while making minority groups feel safe and heard. Tatiana puts forward a few points to tackle the toxic culture. “Pay attention to your biggest proponents and how they reject all discussion of the injustices of tech. It’s subtle like that, and, it’s as overt as throwing white supremacist hand gestures at conferences on stage. Neither is necessarily more dangerous than the other, but instead shows the journey and spectrum of radicalization—it’s a process.” She urges, “If you want to clean up the community, you’ve got to see what systemic forces allow these hateful dingdongs to sit so comfortably in your space.  I’m here to help and hope I have today already, as a member of tech, but I need you to do the work there.” “Developers don’t belong on a pedestal, they’re doing a job like everyone else” – April Wensel on toxic tech culture and Compassionate Coding [Interview] Github Sponsors: Could corporate strategy eat FOSS culture for dinner? Microsoft’s #MeToo reckoning: female employees speak out against workplace harassment and discrimination
Read more
  • 0
  • 0
  • 7713

article-image-mozilla-proposes-webassembly-interface-types-to-enable-language-interoperability
Bhagyashree R
23 Aug 2019
4 min read
Save for later

Mozilla proposes WebAssembly Interface Types to enable language interoperability

Bhagyashree R
23 Aug 2019
4 min read
WebAssembly will soon be able to use the same high-level types in Python, Rust, and Node says Lin Clark, a Principal Research Engineer at Mozilla, with the help of a new proposal: WebAssembly Interface Types. This proposal aims to add a new set of interface types that will describe high-level values like strings, sequences, records, and variants in WebAssembly. https://twitter.com/linclark/status/1164206550010884096 Why WebAssembly Interface Type matters Mozilla and many other companies have been putting their efforts into bringing WebAssembly outside the browser with projects like WASI and Fastly’s Lucet. Developers also want to run WebAssembly from different source languages like Python, Ruby, and Rust. Clark believes there are three reasons why developers want to do that. First, this will allow them to easily use native modules and deliver better speed to their application users. Second, they can use WebAssembly to sandbox native code for better security. Third, they can save time and maintenance cost by sharing native code across platforms. However, currently, this “cross-language integration” is very complicated. The problem is that WebAssembly currently only supports numbers, so it becomes difficult in cases like passing a string between JS and WebAssembly. You will first have to convert the string into an array of numbers and then convert them back into a string. “This means the two languages can call each other’s functions. But if a function takes or returns anything besides numbers, things get complicated,” Clark explains. So, to get past this hurdle you either need to write “a really hard-to-use API that only speaks in numbers” or “add glue code for every single environment you want this module to run in.” This is why Clark and her team have come up with WebAssembly Interface Types. It will allow WebAssembly modules to interoperate with modules running in their own native runtimes and other WebAssembly modules written in different source languages. It will also be able to talk directly with the host systems. It will achieve all of this using rich APIs and complex types. Source: Mozilla WebAssembly Interface Types are different from the types we have in WebAssembly today. Also, there will not be any new operations added to WebAssembly because of them. All the operations will be performed on the concrete types on both communicating sides. Explaining how this will work, Clark wrote, “There’s one key point that makes this possible: with interface types, the two sides aren’t trying to share a representation. Instead, the default is to copy values between one side and the other.” What WebAssembly developers think about this proposal The news sparked a discussion on Hacker News. A user commented that this could in the future prevent a lot of rewrites and duplication, “I'm very happy to see the WebIDL proposal replaced with something generalized.  The article brings up an interesting point: WebAssembly really could enable seamless cross-language integration in the future. Writing a project in Rust, but really want to use that popular face detector written in Python? And maybe the niche language tokenizer written in PHP? And sprinkle ffmpeg on top, without the hassle of target-compatible compilation and worrying about use after free vulnerabilities? No problem use one of the many WASM runtimes popping up and combine all those libraries by using their pre-compiled WASM packages distributed on a package repo like WAPM, with auto-generated bindings that provide a decent API from your host language.” Another user added, ”Of course, cross-language interfaces will always have tradeoffs. But we see Interface Types extending the space where the tradeoffs are worthwhile, especially in combination with wasm's sandboxing.” Some users are also unsure that this will actually work in practice. Here’s what a Reddit user said, “I wonder how well this will work in practice. effectively this is attempting to be universal language interop. that is a bold goal. I suspect this will never work for complicated object graphs. maybe this is for numbers and strings only. I wonder if something like protobuf wouldn't actually be better. it looked from the graphics that memory is still copied anyway (which makes sense, eg going from a cstring to a java string), but this is still marshalling. maybe you can skip this in some cases, but is that important enough to hinge the design there?” To get a deeper understanding of WebAssembly Interface Types, watch this explainer video by Mozilla: https://www.youtube.com/watch?time_continue=17&v=Qn_4F3foB3Q Also, check out Lin Clark’s article, WebAssembly Interface Types: Interoperate with All the Things. Wasmer introduces WebAssembly Interfaces for validating the imports and exports of a Wasm module Fastly CTO Tyler McMullen on Lucet and the future of WebAssembly and Rust [Interview] LLVM WebAssembly backend will soon become Emscripten’s default backend, V8 announces
Read more
  • 0
  • 0
  • 3506

article-image-security-researcher-publicly-releases-second-steam-zero-day-after-being-banned-from-valves-bug-bounty-program
Savia Lobo
22 Aug 2019
6 min read
Save for later

Security researcher publicly releases second Steam zero-day after being banned from Valve's bug bounty program

Savia Lobo
22 Aug 2019
6 min read
Updated with Valve’s response: Valve, in a statement on August 22, said that its HackerOne bug bounty program, should not have turned away Kravets when he reported the second vulnerability and called it “a mistake”. A Russian security researcher, Vasily Kravets, has found a second zero-day vulnerability in the Steam gaming platform, in a span of two weeks. The researcher said he reported the first Steam zero-day vulnerability earlier in August, to its parent company, Valve, and tried to have it fixed before public disclosure. However, “he said he couldn't do the same with the second because the company banned him from submitting further bug reports via its public bug bounty program on the HackerOne platform,” ZDNet reports. Source: amonitoring.ru This first flaw was a “privilege-escalation vulnerability that can allow an attacker to level up and run any program with the highest possible rights on any Windows computer with Steam installed. It was released after Valve said it wouldn’t fix it (Valve then published a patch, that the same researcher said can be bypassed),” according to Threatpost. Although Kravets was banned from the Hacker One platform, he disclosed the second flaw that enables a local privilege escalation in the Steam client on Tuesday and said that the flaw would be simple for any OS user to exploit. Kravets told Threatpost that he is not aware of a patch for the vulnerability. “Any user on a PC could do all actions from exploit’s description (even ‘Guest’ I think, but I didn’t check this). So [the] only requirement is Steam,” Kravets told Threatpost. He also said, “It’s sad and simple — Valve keeps failing. The last patch, that should have solved the problem, can be easily bypassed so the vulnerability still exists. Yes, I’ve checked, it works like a charm.” Another security researcher, Matt Nelson also said he had found the exact same bug as Kravets had, which “he too reported to Valve's HackerOne program, only to go through a similar bad experience as Kravets,” ZDNet reports. He said both Valve and HackerOne took five days to acknowledge the bug and later refused to patch it. Further, they locked the bug report when Nelson wanted to disclose the bug publicly and warn users. “Nelson later released proof-of-concept code for the first Steam zero-day, and also criticized Valve and HackerOne for their abysmal handling of his bug report”, ZDNet reports. https://twitter.com/enigma0x3/status/1148031014171811841 “Despite any application itself could be harmful, achieving maximum privileges can lead to much more disastrous consequences. For example, disabling firewall and antivirus, rootkit installation, concealing of process-miner, theft any PC user’s private data — is just a small portion of what could be done,”  said Kravets. Kravets demonstrated the second Steam zero-day and also detailed the vulnerability on his website. Per Threatpost as of August 21, “Valve did not respond to a request for comment about the vulnerability, bug bounty incident and whether a patch is available. HackerOne did not have a comment.” Other researchers who have participated in Valve’s bug bounty program are infuriated over Valve’s decision to not only block Kravets from submitting further bug reports, but also refusing to patch the flaw. https://twitter.com/Viss/status/1164055856230440960 https://twitter.com/kamenrannaa/status/1164408827266998273 A user on Reddit writes, “If management isn't going to take these issues seriously and respect a bug bounty program, then you need to bring about some change from within. Now they are just getting bug reports for free.” Nelson said the Hacker One “representative said the vulnerability was out of scope to qualify for Valve’s bug bounty program,” Ars Technica writes. Further, when Nelson said that he was not seeking any monetary gains and only wanted the public to be aware of the vulnerability, the HackerOne representative asked Nelson to “please familiarize yourself with our disclosure guidelines and ensure that you’re not putting the company or yourself at risk. https://www.hackerone.com/disclosure-guidelines.” https://twitter.com/enigma0x3/status/1160961861560479744 Nelson also reported the vulnerability directly to Valve. Valve, first acknowledged the report and “noted that I shouldn’t expect any further communication.” He never heard anything more from the company. In am email to Ars Technica, Nelson writes, “I can certainly believe that the scoping was misinterpreted by HackerOne staff during the triage efforts. It is mind-blowing to me that the people at HackerOne who are responsible for triaging vulnerability reports for a company as large as Valve didn’t see the importance of Local Privilege Escalation and simply wrote the entire report off due to misreading the scope.” A HackerOne spokeswoman told Ars Technica, “We aim to explicitly communicate our policies and values in all cases and here we could have done better. Vulnerability disclosure is an inherently murky process and we are, and have always been, committed to protecting the interests of hackers. Our disclosure guidelines emphasize mutual respect and empathy, encouraging all to act in good faith and for the benefit of the common good.” Katie Moussouris, founder and CEO of Luta Security, also said, “Silencing the researcher on one issue is in complete violation of the ISO standard practices, and banning them from reporting further issues is simply irresponsible to affected users who would otherwise have benefited from these researchers continuing to engage and report issues privately to get them fixed. The norms of vulnerability disclosure are being warped by platforms that put profits before people.” Valve agrees that turning down Kravets’ request was “a mistake” Valve, in a statement on August 22, said that its HackerOne bug bounty program, should not have turned away Kravets when he reported the second vulnerability and called it a mistake. In an email statement to ZDNet, a Valve representative said that “the company has shipped fixes for the Steam client, updated its bug bounty program rules, and is reviewing the researcher's ban on its public bug bounty program.” The company also writes, “Our HackerOne program rules were intended only to exclude reports of Steam being instructed to launch previously installed malware on a user’s machine as that local user. Instead, misinterpretation of the rules also led to the exclusion of a more serious attack that also performed local privilege escalation through Steam.  In regards to the specific researchers, we are reviewing the details of each situation to determine the appropriate actions. We aren’t going to discuss the details of each situation or the status of their accounts at this time.” To know more about this news in detail, read Kravets’ blog post. You could also check out Threatpost’s detailed coverage. Puppet launches Puppet Remediate, a vulnerability remediation solution for IT Ops A second zero-day found in Firefox was used to attack Coinbase employees; fix released in Firefox 67.0.4 and Firefox ESR 60.7.2 The EU Bounty Program enabled in VLC 3.0.7 release, this version fixed the most number of security issue
Read more
  • 0
  • 0
  • 2580
article-image-bitbucket-to-no-longer-support-mercurial-users-must-migrate-to-git-by-may-2020
Fatema Patrawala
21 Aug 2019
6 min read
Save for later

Bitbucket to no longer support Mercurial, users must migrate to Git by May 2020

Fatema Patrawala
21 Aug 2019
6 min read
Yesterday marked an end of an era for Mercurial users, as Bitbucket announced to no longer support Mercurial repositories after May 2020. Bitbucket, owned by Atlassian, is a web-based version control repository hosting service, for source code and development projects. It has used Mercurial since the beginning in 2008 and then Git since October 2011. Now almost after ten years of sharing its journey with Mercurial, the Bitbucket team has decided to remove the Mercurial support from the Bitbucket Cloud and its API. The official announcement reads, “Mercurial features and repositories will be officially removed from Bitbucket and its API on June 1, 2020.” The Bitbucket team also communicated the timeline for the sunsetting of the Mercurial functionality. After February 1, 2020 users will no longer be able to create new Mercurial repositories. And post June 1, 2020 users will not be able to use Mercurial features in Bitbucket or via its API and all Mercurial repositories will be removed. Additionally all current Mercurial functionality in Bitbucket will be available through May 31, 2020. The team said the decision was not an easy one for them and Mercurial held a special place in their heart. But according to a Stack Overflow Developer Survey, almost 90% of developers use Git, while Mercurial is the least popular version control system with only about 3% developer adoption. Apart from this Mercurial usage on Bitbucket saw a steady decline, and the percentage of new Bitbucket users choosing Mercurial fell to less than 1%. Hence they decided on removing the Mercurial repos. How can users migrate and export their Mercurial repos Bitbucket team recommends users to migrate their existing Mercurial repos to Git. They have also extended support for migration, and kept the available options open for discussion in their dedicated Community thread. Users can discuss about conversion tools, migration, tips, and also offer troubleshooting help. If users prefer to continue using the Mercurial system, there are a number of free and paid Mercurial hosting services for them. The Bitbucket team has also created a Git tutorial that covers everything from the basics of creating pull requests to rebasing and Git hooks. Community shows anger and sadness over decision to discontinue Mercurial support There is an outrage among the Mercurial users as they are extremely unhappy and sad with this decision by Bitbucket. They have expressed anger not only on one platform but on multiple forums and community discussions. Users feel that Bitbucket’s decision to stop offering Mercurial support is bad, but the decision to also delete the repos is evil. On Hacker News, users speculated that this decision was influenced by potential to market rather than based on technically superior architecture and ease of use. They feel GitHub has successfully marketed Git and that's how both have become synonymous to the developer community. One of them comments, “It's very sad to see bitbucket dropping mercurial support. Now only Facebook and volunteers are keeping mercurial alive. Sometimes technically better architecture and user interface lose to a non user friendly hard solutions due to inertia of mass adoption. So a lesson in Software development is similar to betamax and VHS, so marketing is still a winner over technically superior architecture and ease of use. GitHub successfully marketed git, so git and GitHub are synonymous for most developers. Now majority of open source projects are reliant on a single proprietary solution Github by Microsoft, for managing code and project. Can understand the difficulty of bitbucket, when Python language itself moved out of mercurial due to the same inertia. Hopefully gitlab can come out with mercurial support to migrate projects using it from bitbucket.” Another user comments that Mercurial support was the only reason for him to use Bitbucket when GitHub is miles ahead of Bitbucket. Now when it stops supporting Mercurial too, Bitbucket will end soon. The comment reads, “Mercurial support was the one reason for me to still use Bitbucket: there is no other Bitbucket feature I can think of that Github doesn't already have, while Github's community is miles ahead since everyone and their dog is already there. More importantly, Bitbucket leaves the migration to you (if I read the article correctly). Once I download my repo and convert it to git, why would I stay with the company that just made me go through an annoying (and often painful) process, when I can migrate to Github with the exact same command? And why isn't there a "migrate this repo to git" button right there? I want to believe that Bitbucket has smart people and that this choice is a good one. But I'm with you there - to me, this definitely looks like Bitbucket will die.” On Reddit, programming folks see this as a big change from Bitbucket as they are the major mercurial hosting provider. And they feel Bitbucket announced this at a pretty short notice and they require more time for migration. Apart from the developer community forums, on Atlassian community blog as well users have expressed displeasure. A team of scientists commented, “Let's get this straight : Bitbucket (offering hosting support for Mercurial projects) was acquired by Atlassian in September 2010. Nine years later Atlassian decides to drop Mercurial support and delete all Mercurial repositories. Atlassian, I hate you :-) The image you have for me is that of a harmful predator. We are a team of scientists working in a university. We don't have computer scientists, we managed to use a version control simple as Mercurial, and it was a hard work to make all scientists in our team to use a version control system (even as simple as Mercurial). We don't have the time nor the energy to switch to another version control system. But we will, forced and obliged. I really don't want to check out Github or something else to migrate our projects there, but we will, forced and obliged.” Atlassian Bitbucket, GitHub, and GitLab take collective steps against the Git ransomware attack Attackers wiped many GitHub, GitLab, and Bitbucket repos with ‘compromised’ valid credentials leaving behind a ransom note BitBucket goes down for over an hour
Read more
  • 0
  • 0
  • 10269

article-image-google-open-sources-an-on-device-real-time-hand-gesture-recognition-algorithm-built-with-mediapipe
Sugandha Lahoti
21 Aug 2019
3 min read
Save for later

Google open sources an on-device, real-time hand gesture recognition algorithm built with MediaPipe

Sugandha Lahoti
21 Aug 2019
3 min read
Google researchers have unveiled a new real-time hand tracking algorithm that could be a new breakthrough for people communicating via sign language. Their algorithm uses machine learning to compute 3D keypoints of a hand from a video frame. This research is implemented in MediaPipe which is an open-source cross-platform framework for building multimodal (eg. video, audio, any time series data) applied ML pipelines. What is interesting is that the 3D hand perception can be viewed in real-time on a mobile phone. How real-time hand perception and gesture recognition works with MediaPipe? The algorithm is built using the MediaPipe framework. Within this framework, the pipeline is built as a directed graph of modular components. The pipeline employs three different models: a palm detector model, a handmark detector model and a gesture recognizer. The palm detector operates on full images and outputs an oriented bounding box. They employ a single-shot detector model called BlazePalm, They achieve an average precision of 95.7% in palm detection. Next, the hand landmark takes the cropped image defined by the palm detector and returns 3D hand keypoints. For detecting key points on the palm images, researchers manually annotated around 30K real-world images with 21 coordinates. They also generated a synthetic dataset to improve the robustness of the hand landmark detection model. The gesture recognizer then classifies the previously computed keypoint configuration into a discrete set of gestures. The algorithm determines the state of each finger, e.g. bent or straight, by the accumulated angles of joints. The existing pipeline supports counting gestures from multiple cultures, e.g. American, European, and Chinese, and various hand signs including “Thumb up”, closed fist, “OK”, “Rock”, and “Spiderman”. They also trained their models to work in a wide variety of lighting situations and with a diverse range of skin tones. Gesture recognition - Source: Google blog With MediaPipe, the researchers built their pipeline as a directed graph of modular components, called Calculators. Individual calculators like cropping, rendering , and neural network computations can be performed exclusively on the GPU. They employed TFLite GPU inference on most modern phones. The researchers are open sourcing the hand tracking and gesture recognition pipeline in the MediaPipe framework along with the source code. The researchers Valentin Bazarevsky and Fan Zhang write in a blog post, “Whereas current state-of-the-art approaches rely primarily on powerful desktop environments for inference, our method, achieves real-time performance on a mobile phone, and even scales to multiple hands. We hope that providing this hand perception functionality to the wider research and development community will result in an emergence of creative use cases, stimulating new applications and new research avenues.” People commended the fact that this algorithm can run on mobile devices and is useful for people who communicate via sign language. https://twitter.com/SOdaibo/status/1163577788764495872 https://twitter.com/anshelsag/status/1163597036442148866 https://twitter.com/JonCorey1/status/1163997895835693056 Microsoft Azure VP demonstrates Holoportation, a reconstructed transmittable 3D technology Terrifyingly realistic Deepfake video of Bill Hader transforming into Tom Cruise is going viral on YouTube. Google News Initiative partners with Google AI to help ‘deep fake’ audio detection research
Read more
  • 0
  • 0
  • 7774