Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon

How-To Tutorials - Data

1210 Articles
article-image-data-science-vs-machine-learning-understanding-the-difference-and-what-it-means-today
Richard Gall
02 Sep 2019
8 min read
Save for later

Data science vs. machine learning: understanding the difference and what it means today

Richard Gall
02 Sep 2019
8 min read
One of the things that I really love about the tech industry is how often different terms - buzzwords especially - can cause confusion. It isn’t hard to see this in the wild. Quora is replete with confused people asking about the difference between a ‘developer’ and an ‘engineer’ and how ‘infrastructure’ is different from ‘architecture'. One of the biggest points of confusion is the difference between data science and machine learning. Both terms refer to different but related domains - given their popularity it isn’t hard to see how some people might be a little perplexed. This might seem like a purely semantic problem, but in the context of people’s careers, as they make decisions about the resources they use and the courses they pay for, the distinction becomes much more important. Indeed, it can be perplexing for developers thinking about their career - with machine learning engineer starting to appear across job boards, it’s not always clear where that role begins and ‘data scientist’ begins. Tl;dr: To put it simply - and if you can’t be bothered to read further - data science is a discipline or job role that’s all about answering business questions through data. Machine learning, meanwhile, is a technique that can be used to analyze or organize data. So, data scientists might well use machine learning to find something out, but it would only be one aspect of their job. But what are the implications of this distinction between machine learning and data science? What can the relationship between the two terms tell us about how technology trends evolve? And how can it help us better understand them both? Read next: 9 data science myths debunked What’s causing confusion about the difference between machine learning and data science? The data science v machine learning confusion comes from the fact that both terms have a significant grip on the collective imagination of the tech and business world. Back in 2012 the Harvard Business Review declared data scientist to be the ‘sexiest job of the 21st century’. This was before the machine learning and artificial intelligence boom, but it’s the point we need to go back to understand how data has shaped the tech industry as we know it today. Data science v machine learning on Google Trends Take a look at this Google trends graph: Both terms broadly received a similar level of interest. ‘Machine learning’ was slightly higher throughout the noughties and a larger gap has emerged more recently. However, despite that, it’s worth looking at the period around 2014 when ‘data science’ managed to eclipse machine learning. Today, that feels remarkable given how machine learning is a term that’s extended out into popular consciousness. It suggests that the HBR article was incredibly timely, identifying the emergence of the field. But more importantly, it’s worth noting that this spike for ‘data science’ comes at the time that both terms surge in popularity. So, although machine learning eventually wins out, ‘data science’ was becoming particularly important at a time when these twin trends were starting to grow. This is interesting, and it’s contrary to what I’d expect. Typically, I’d imagine the more technical term to take precedence over a more conceptual field: a technical trend emerges, for a more abstract concept to gain traction afterwards. But here the concept - the discipline - spikes just at the point before machine learning can properly take off. This suggests that the evolution and growth of machine learning begins with the foundations of data science. This is important. It highlights that the obsession with data science - which might well have seemed somewhat self-indulgent - was, in fact, an integral step for business to properly make sense of what the ‘big data revolution’ (a phrase that sounds eighty years old) meant in practice. Insofar as ‘data science’ is a term that really just refers to a role that’s performed, it’s growth was ultimately evidence of a space being carved out inside modern businesses that gave a domain expert the freedom to explore and invent in the service of business objectives. If that was the baseline, then the continued rise of machine learning feels inevitable. From being contained in computer science departments in academia, and then spreading into business thanks to the emergence of the data scientist job role, we then started to see a whole suite of tools and use cases that were about much more than analytics and insight. Machine learning became a practical tool that had practical applications everywhere. From cybersecurity to mobile applications, from marketing to accounting, machine learning couldn’t be contained within the data science discipline. This wasn’t just a conceptual point - practically speaking, a data scientist simply couldn’t provide support to all the different ways in which business functions wanted to use machine learning. So, the confusion around the relationship between machine learning and data science stems from the fact that the two trends go hand in hand - or at least they used to. To properly understand how they’re different, let’s look at what a data scientist actually does. Read next: Data science for non-techies: How I got started (Part 1) What is data science, exactly? I know you’re not supposed to use Wikipedia as a reference, but the opening sentence in the entry for ‘data science’ is instructive: “Data science is a multi-disciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from structured and unstructured data.” The word that deserves your attention is multi-disciplinary as this underlines what makes data science unique and why it stands outside of the more specific taxonomy of machine learning terms. Essentially, it’s a human activity as much as a technical one - it’s about arranging, organizing, interpreting, and communicating data. To a certain extent it shares a common thread of DNA with statistics. But although Nate Silver said that ‘data scientist’ was “a sexed up term for statistician”, I think there are some important distinctions. To do data science well you need to be deeply engaged with how your work integrates with the wider business strategy and processes. The term ‘statistics’ - like ‘machine learning’ - doesn’t quite do this. Indeed, to a certain extent this has made data science a challenging field to work in. It isn’t hard to find evidence that data scientists are trying to leave their jobs, frustrated with how their roles are being used and how they integrate into existing organisational structures. How do data scientists use machine learning? As a data scientist, your job is to answer questions. These are questions like: What might happen if we change the price of a product in this way? What do our customers think of our products? How often do customers purchase products? How are customers using our products? How can we understand the existing market? How might we tackle it? Where could we improve efficiencies in our processes? That’s just a small set. The types of questions data scientists will be tackling will vary depending on the industry, their company - everything. Every data science job is unique. But whatever questions data scientists are asking, it’s likely that at some point they’ll be using machine learning. Whether it’s to analyze customer sentiment (grouping and sorting) or predicting outcomes, a data scientist will have a number of algorithms up their proverbial sleeves ready to tackle whatever the business throws at them. Machine learning beyond data science The machine learning revolution might have started in data science, but it has rapidly expanded far beyond that strict discipline. Indeed, one of the reasons that some people are confused about the relationship between the two concepts is because machine learning is today touching just about everything, like water spilling out of its neat data science container. Machine learning is for everyone Machine learning is being used in everything from mobile apps to cybersecurity. And although data scientists might sometimes play a part in these domains, we’re also seeing subject specific developers and engineers taking more responsibility for how machine learning is used. One of the reasons for this is, as I mentioned earlier, the fact that a data scientist - or even a couple of them - can’t do all the things that a business might want them to when it comes to machine learning. But another is the fact that machine learning is getting easier. You no longer need to be an expert to employ machine learning algorithms - instead, you need to have the confidence and foundational knowledge to use existing machine learning tools and products. This ‘productization’ of machine learning is arguably what’s having the biggest impact on how we understand the topic. It’s even shrinking data science, making it a more specific role. That might sound like data science is less important today than it was in 2014, but it can only be a good thing for data scientists - it means they are being asked to spread themselves so thinly. So, if you've been googling 'data science v machine learning', you now know the answer. The two terms are distinct but they both come out of the 'big data revolution' which we're still living through. Both trends and terms are likely to evolve in the future, but they're certainly not going to disappear - as the data at our disposal grow, making effective use of it is only going to become more important.
Read more
  • 0
  • 0
  • 5346

article-image-bitbucket-to-no-longer-support-mercurial-users-must-migrate-to-git-by-may-2020
Fatema Patrawala
21 Aug 2019
6 min read
Save for later

Bitbucket to no longer support Mercurial, users must migrate to Git by May 2020

Fatema Patrawala
21 Aug 2019
6 min read
Yesterday marked an end of an era for Mercurial users, as Bitbucket announced to no longer support Mercurial repositories after May 2020. Bitbucket, owned by Atlassian, is a web-based version control repository hosting service, for source code and development projects. It has used Mercurial since the beginning in 2008 and then Git since October 2011. Now almost after ten years of sharing its journey with Mercurial, the Bitbucket team has decided to remove the Mercurial support from the Bitbucket Cloud and its API. The official announcement reads, “Mercurial features and repositories will be officially removed from Bitbucket and its API on June 1, 2020.” The Bitbucket team also communicated the timeline for the sunsetting of the Mercurial functionality. After February 1, 2020 users will no longer be able to create new Mercurial repositories. And post June 1, 2020 users will not be able to use Mercurial features in Bitbucket or via its API and all Mercurial repositories will be removed. Additionally all current Mercurial functionality in Bitbucket will be available through May 31, 2020. The team said the decision was not an easy one for them and Mercurial held a special place in their heart. But according to a Stack Overflow Developer Survey, almost 90% of developers use Git, while Mercurial is the least popular version control system with only about 3% developer adoption. Apart from this Mercurial usage on Bitbucket saw a steady decline, and the percentage of new Bitbucket users choosing Mercurial fell to less than 1%. Hence they decided on removing the Mercurial repos. How can users migrate and export their Mercurial repos Bitbucket team recommends users to migrate their existing Mercurial repos to Git. They have also extended support for migration, and kept the available options open for discussion in their dedicated Community thread. Users can discuss about conversion tools, migration, tips, and also offer troubleshooting help. If users prefer to continue using the Mercurial system, there are a number of free and paid Mercurial hosting services for them. The Bitbucket team has also created a Git tutorial that covers everything from the basics of creating pull requests to rebasing and Git hooks. Community shows anger and sadness over decision to discontinue Mercurial support There is an outrage among the Mercurial users as they are extremely unhappy and sad with this decision by Bitbucket. They have expressed anger not only on one platform but on multiple forums and community discussions. Users feel that Bitbucket’s decision to stop offering Mercurial support is bad, but the decision to also delete the repos is evil. On Hacker News, users speculated that this decision was influenced by potential to market rather than based on technically superior architecture and ease of use. They feel GitHub has successfully marketed Git and that's how both have become synonymous to the developer community. One of them comments, “It's very sad to see bitbucket dropping mercurial support. Now only Facebook and volunteers are keeping mercurial alive. Sometimes technically better architecture and user interface lose to a non user friendly hard solutions due to inertia of mass adoption. So a lesson in Software development is similar to betamax and VHS, so marketing is still a winner over technically superior architecture and ease of use. GitHub successfully marketed git, so git and GitHub are synonymous for most developers. Now majority of open source projects are reliant on a single proprietary solution Github by Microsoft, for managing code and project. Can understand the difficulty of bitbucket, when Python language itself moved out of mercurial due to the same inertia. Hopefully gitlab can come out with mercurial support to migrate projects using it from bitbucket.” Another user comments that Mercurial support was the only reason for him to use Bitbucket when GitHub is miles ahead of Bitbucket. Now when it stops supporting Mercurial too, Bitbucket will end soon. The comment reads, “Mercurial support was the one reason for me to still use Bitbucket: there is no other Bitbucket feature I can think of that Github doesn't already have, while Github's community is miles ahead since everyone and their dog is already there. More importantly, Bitbucket leaves the migration to you (if I read the article correctly). Once I download my repo and convert it to git, why would I stay with the company that just made me go through an annoying (and often painful) process, when I can migrate to Github with the exact same command? And why isn't there a "migrate this repo to git" button right there? I want to believe that Bitbucket has smart people and that this choice is a good one. But I'm with you there - to me, this definitely looks like Bitbucket will die.” On Reddit, programming folks see this as a big change from Bitbucket as they are the major mercurial hosting provider. And they feel Bitbucket announced this at a pretty short notice and they require more time for migration. Apart from the developer community forums, on Atlassian community blog as well users have expressed displeasure. A team of scientists commented, “Let's get this straight : Bitbucket (offering hosting support for Mercurial projects) was acquired by Atlassian in September 2010. Nine years later Atlassian decides to drop Mercurial support and delete all Mercurial repositories. Atlassian, I hate you :-) The image you have for me is that of a harmful predator. We are a team of scientists working in a university. We don't have computer scientists, we managed to use a version control simple as Mercurial, and it was a hard work to make all scientists in our team to use a version control system (even as simple as Mercurial). We don't have the time nor the energy to switch to another version control system. But we will, forced and obliged. I really don't want to check out Github or something else to migrate our projects there, but we will, forced and obliged.” Atlassian Bitbucket, GitHub, and GitLab take collective steps against the Git ransomware attack Attackers wiped many GitHub, GitLab, and Bitbucket repos with ‘compromised’ valid credentials leaving behind a ransom note BitBucket goes down for over an hour
Read more
  • 0
  • 0
  • 10295

article-image-google-open-sources-an-on-device-real-time-hand-gesture-recognition-algorithm-built-with-mediapipe
Sugandha Lahoti
21 Aug 2019
3 min read
Save for later

Google open sources an on-device, real-time hand gesture recognition algorithm built with MediaPipe

Sugandha Lahoti
21 Aug 2019
3 min read
Google researchers have unveiled a new real-time hand tracking algorithm that could be a new breakthrough for people communicating via sign language. Their algorithm uses machine learning to compute 3D keypoints of a hand from a video frame. This research is implemented in MediaPipe which is an open-source cross-platform framework for building multimodal (eg. video, audio, any time series data) applied ML pipelines. What is interesting is that the 3D hand perception can be viewed in real-time on a mobile phone. How real-time hand perception and gesture recognition works with MediaPipe? The algorithm is built using the MediaPipe framework. Within this framework, the pipeline is built as a directed graph of modular components. The pipeline employs three different models: a palm detector model, a handmark detector model and a gesture recognizer. The palm detector operates on full images and outputs an oriented bounding box. They employ a single-shot detector model called BlazePalm, They achieve an average precision of 95.7% in palm detection. Next, the hand landmark takes the cropped image defined by the palm detector and returns 3D hand keypoints. For detecting key points on the palm images, researchers manually annotated around 30K real-world images with 21 coordinates. They also generated a synthetic dataset to improve the robustness of the hand landmark detection model. The gesture recognizer then classifies the previously computed keypoint configuration into a discrete set of gestures. The algorithm determines the state of each finger, e.g. bent or straight, by the accumulated angles of joints. The existing pipeline supports counting gestures from multiple cultures, e.g. American, European, and Chinese, and various hand signs including “Thumb up”, closed fist, “OK”, “Rock”, and “Spiderman”. They also trained their models to work in a wide variety of lighting situations and with a diverse range of skin tones. Gesture recognition - Source: Google blog With MediaPipe, the researchers built their pipeline as a directed graph of modular components, called Calculators. Individual calculators like cropping, rendering , and neural network computations can be performed exclusively on the GPU. They employed TFLite GPU inference on most modern phones. The researchers are open sourcing the hand tracking and gesture recognition pipeline in the MediaPipe framework along with the source code. The researchers Valentin Bazarevsky and Fan Zhang write in a blog post, “Whereas current state-of-the-art approaches rely primarily on powerful desktop environments for inference, our method, achieves real-time performance on a mobile phone, and even scales to multiple hands. We hope that providing this hand perception functionality to the wider research and development community will result in an emergence of creative use cases, stimulating new applications and new research avenues.” People commended the fact that this algorithm can run on mobile devices and is useful for people who communicate via sign language. https://twitter.com/SOdaibo/status/1163577788764495872 https://twitter.com/anshelsag/status/1163597036442148866 https://twitter.com/JonCorey1/status/1163997895835693056 Microsoft Azure VP demonstrates Holoportation, a reconstructed transmittable 3D technology Terrifyingly realistic Deepfake video of Bill Hader transforming into Tom Cruise is going viral on YouTube. Google News Initiative partners with Google AI to help ‘deep fake’ audio detection research
Read more
  • 0
  • 0
  • 7783
Banner background image

article-image-twitter-and-facebook-removed-accounts-of-chinese-state-run-media-agencies-aimed-at-undermining-hong-kong-protests
Sugandha Lahoti
20 Aug 2019
5 min read
Save for later

Twitter and Facebook removed accounts of Chinese state-run media agencies aimed at undermining Hong Kong protests

Sugandha Lahoti
20 Aug 2019
5 min read
Update August 23, 2019: After Twitter, and Facebook Google has shutdown 210 YouTube channels that were tied to misinformation about Hong Kong protesters. The article has been updated accordingly. Chinese state-run media agencies have been buying advertisements and promoted tweets on Twitter and Facebook to portray Hong Kong protestors and their pro-democracy demonstrations as violent. These ads, reported by Pinboard’s Twitter account were circulated by State-run news agency Xinhua calling these protesters as those "escalating violence" and calls for "order to be restored." In reality, Hong Kong protests have been called a completely peaceful march. Pinboard warned and criticized Twitter about these tweets and asked for its takedown. Though Twitter and Facebook are banned in China, the Chinese state-run media runs several English-language accounts to present its views to the outside world. https://twitter.com/pinboard/status/1162711159000055808 https://twitter.com/Pinboard/status/1163072157166886913 Twitter bans 936 accounts managed by the Chinese state Following this revelation, in a blog post yesterday, Twitter said that they are discovering a “significant state-backed information operation focused on the situation in Hong Kong, specifically the protest movement”.  They identified 936 accounts that were undermining “the legitimacy and political positions of the protest movement on the ground.” They found a larger, spammy network of approximately 200,000 accounts which represented the most active portions of this campaign. These were suspended for a range of violations of their platform manipulation policies.  These accounts were able to access Twitter through VPNs and over a "specific set of unblocked IP addresses" from within China. “Covert, manipulative behaviors have no place on our service — they violate the fundamental principles on which our company is built,” said Twitter. Twitter bans ads from Chinese state-run media Twitter also banned advertising from Chinese state-run news media entities across the world and declared that affected accounts will be free to continue to use Twitter to engage in public conversation, but not in their advertising products. This policy will apply to news media entities that are either financially or editorially controlled by the state, said Twitter. They will be notified directly affected entities who will be given 30 days to offboard from advertising products. No new campaigns will be allowed. However, Pinboard argues that 30 days is too long; Twitter should not wait and suspend Xinhua's ad account immediately. https://twitter.com/Pinboard/status/1163676410998689793 It also calls on Twitter to disclose: How much money it took from Xinhua How many ads it ran for them since the start of the Hong Kong protests in June and How those ads were targeted Facebook blocks Chinese accounts engaged in inauthentic behavior Following a tip shared by Twitter, Facebook also removed seven Pages, three Groups and five Facebook accounts involved in coordinated inauthentic behavior as part of a small network that originated in China and focused on Hong Kong. However, unlike Twitter, Facebook did not announce any policy changes in response to the discovery. YouTube was also notably absent in the fight against Chinese misinformation propagandas. https://twitter.com/Pinboard/status/1163694701716766720 However, on 22nd August, Youtube axed 210 Youtube channels found to be spreading misinformation about the Hong Kong protests. “Earlier this week, as part of our ongoing efforts to combat coordinated influence operations, we disabled 210 channels on YouTube when we discovered channels in this network behaved in a coordinated manner while uploading videos related to the ongoing protests in Hong Kong,” Shane Huntley, director of software engineering for Google Security’s Threat Analysis Group said in a blog post. “We found use of VPNs and other methods to disguise the origin of these accounts and other activity commonly associated with coordinated influence operations.” Kyle Bass, Chief Investment Officer Hayman Capital Management, called on all social media outlets to ban all Chinese state-run propaganda sources. He tweeted, “Twitter, Facebook, and YouTube should BAN all State-backed propaganda sources in China. It’s clear that these 200,000 accounts were set up by the “state” of China. Why allow Xinhua, global times, china daily, or any others to continue to act? #BANthemALL” Public acknowledges Facebook and Twitter’s role in exposing Chinese state media Experts and journalists were appreciative of the role social media played in exposing those guilty and liked how they are responding to state interventions. Bethany Allen-Ebrahimian, President of the International China Journalist Association called it huge news. “This is the first time that US social media companies are openly accusing the Chinese government of running Russian-style disinformation campaigns aimed at sowing discord”, she tweeted. She added, “We’ve been seeing hints that China has begun to learn from Russia’s MO, such as in Taiwan and Cambodia. But for Twitter and Facebook to come out and explicitly accuse the Chinese govt of a disinformation campaign is another whole level entirely.” Adam Schiff, Representative (D-CA 28th District) tweeted, “Twitter and Facebook announced they found and removed a large network of Chinese government-backed accounts spreading disinformation about the protests in Hong Kong. This is just one example of how authoritarian regimes use social media to manipulate people, at home and abroad.” He added, “Social media platforms and the U.S. government must continue to identify and combat state-backed information operations online, whether they’re aimed at disrupting our elections or undermining peaceful protesters who seek freedom and democracy.” Social media platforms took an appreciable step against Chinese state-run media actors attempting to manipulate their platforms to discredit grassroots organizing in Hong Kong. It would be interesting to see if they would continue to protect individual freedoms and provide a safe and transparent platform if state actors from countries where they have a huge audiences like India or US, adopted similar tactics to suppress or manipulate the public or target movements. Facebook bans six toxic extremist accounts and a conspiracy theory organization Cloudflare terminates services to 8chan following yet another set of mass shootings in the US YouTube’s ban on “instructional hacking and phishing” videos receives backlash from the infosec community
Read more
  • 0
  • 0
  • 2317

article-image-terrifyingly-realistic-deepfake-video-of-bill-hader-transforming-into-tom-cruise-is-going-viral-on-youtube
Sugandha Lahoti
14 Aug 2019
4 min read
Save for later

Terrifyingly realistic Deepfake video of Bill Hader transforming into Tom Cruise is going viral on YouTube

Sugandha Lahoti
14 Aug 2019
4 min read
Deepfakes are becoming scaringly and indistinguishably real. A YouTube clip of Bill Hader in conversation with David Letterman on his late-night show in 2008 is going viral where Hader’s face subtly shifts to Cruise’s as Hader does his impression. This viral Deepfake clip has been viewed over 3 million times and is uploaded by Ctrl Shift Face (a Slovakian citizen who goes by the name of Tom), who has created other entertaining videos using Deepfake technology. For the unaware, Deepfake uses Artificial intelligence and deep neural networks to alter audio or video to pass it off as true or original content. https://www.youtube.com/watch?v=VWrhRBb-1Ig Deepfakes are problematic as they make it hard to differentiate between fake and real videos or images. This gives people the liberty to use deepfakes for promoting harassment and illegal activities. The most common use of deepfakes is found in revenge porn, political abuse, and fake celebrities videos as this one. The top comments on the video clip express dangers of realistic AI manipulation. “The fade between faces is absolutely unnoticeable and it's flipping creepy. Nice job!” “I’m always amazed with new technology, but this is scary.” “Ok, so video evidence in a court of law just lost all credibility” https://twitter.com/TheMuleFactor/status/1160925752004624387 Deepfakes can also be used as a weapon of misinformation since they can be used to maliciously hoax governments, populations and cause internal conflict. Gavin Sheridan, CEO of Vizlegal also tweeted the clip, “Imagine when this is all properly weaponized on top of already fractured and extreme online ecosystems and people stop believing their eyes and ears.” He also talked about future impact. “True videos will be called fake videos, fake videos will be called true videos. People steered towards calling news outlets "fake", will stop believing their own eyes. People who want to believe their own version of reality will have all the videos they need to support it,” he tweeted. He also tweeted whether we would require A-list movie actors at all in the future, and could choose which actor will portray what role. His tweet reads, “Will we need A-list actors in the future when we could just superimpose their faces onto the faces of other actors? Would we know the difference?  And could we not choose at the start of a movie which actors we want to play which roles?” The past year has seen accelerated growth in the use of deepfakes. In June, a fake video of Mark Zuckerberg was posted on Instagram, under the username, bill_posters_uk. In the video, Zuckerberg appears to give a threatening speech about the power of Facebook. Facebook had received strong criticism for promoting fake videos on its platform when in May, the company had refused to remove a doctored video of senior politician Nancy Pelosi. Samsung researchers also released a deepfake that could animate faces with just your voice and a picture using temporal GANs. Post this, the House Intelligence Committee held a hearing to examine the public risks posed by “deepfake” videos. Tom, the creator of the viral video told The Guardian that he doesn't see deepfake videos as the end of the world and hopes his deepfakes will raise public awareness of the technology's potential for misuse. “It’s an arms race; someone is creating deepfakes, someone else is working on other technologies that can detect deepfakes. I don’t really see it as the end of the world like most people do. People need to learn to be more critical. The general public are aware that photos could be Photoshopped, but they have no idea that this could be done with video.” Ctrl Shift Face is also on Patreon offering access to bonus materials, behind the scenes footage, deleted scenes, early access to videos for those who provide him monetary support. Now there is a Deepfake that can animate your face with just your voice and a picture. Mark Zuckerberg just became the target of the world’s first high profile white hat deepfake op. Worried about Deepfakes? Check out the new algorithm that manipulate talking-head videos by altering the transcripts.
Read more
  • 0
  • 0
  • 3950

article-image-how-data-privacy-awareness-is-changing-how-companies-do-business
Guest Contributor
09 Aug 2019
7 min read
Save for later

How Data Privacy awareness is changing how companies do business

Guest Contributor
09 Aug 2019
7 min read
Not so long ago, data privacy was a relatively small part of business operations at some companies. They paid attention to it to a minor degree, but it was not a focal point or prime area of concern. That's all changing now as businesses now recognize that failing to take privacy seriously harms the bottom line. That revelation changes how they operate and engage with customers. One of the reasons for this change is the General Data Protection Regulation (GDPR) rule which now affects all European Union companies and those that do business with EU residents. Some analysts viewed regulators as slow to begin enforcing GDPR with fines, but some of them imposed in 2019 total more than $100 million. In 2018, Twitter and Nielsen cited the GDPR as a reason for their falling share prices. No Single Way to Demonstrate Data Privacy Awareness One essential thing for companies to keep in mind is that there is not an all-encompassing way to show customers they emphasize data security. Although security and privacy are distinct, they are closely related to and impact each other. That's because what privacy awareness means differs depending on how a business operates. For example, a business might collect data from customers and feed it back to them through an analytics platform. In this case, showing data privacy awareness might mean publishing a policy that mentions how the company will never sell a person's information to others. For an e-commerce company, emphasizing on a commitment to keep customer information secure might mean going into details about how it protects sensitive data such as credit card numbers. It might also talk about internal strategies used to keep customer information as safe as possible from cybercriminals. One universal aspect of data privacy awareness is that it makes good business sense. The public is now much more aware of data privacy issues than in past years, and that's largely due to the high-profile breaches that capture the headlines. Lost customers, gigantic fines and damaged reputations after Data breaches and misuse When companies don't invest in data privacy measures, they could be victimized by severe data breaches. If that happens,  ramifications are often substantial. A 2019 study from PCI Pal surveyed customers in the United States and the United Kingdom to determine how their perceptions and spending habits changed following data breaches. It found that 41% of United Kingdom customers and 21% of people in the U.S. stop spending money at business forever if it suffers a data breach. The more common action is for consumers to stop spending money at breached businesses for several months afterward, the poll revealed. In total, 62% of Americans and 44% of Brits said they’d take that approach. However, that's not the only potential hit to a company's profitability. As the Facebook example mentioned earlier indicates, there can also be massive fines. Two other recent examples involve the British Airways and Marriott Hotels breaches. A data regulatory body in the United Kingdom imposed the largest-ever data breach fine on British Airways after a 2018 hack, with the penalty totaling £183 million — more than $228 million. Then, that same authority gave Marriott Hotels the equivalent of a $125 million fine for its incident, alleging inadequate cybersecurity and data privacy due diligence. These enormous fines don't only happen in the United Kingdom. Besides its recent decision with Facebook, the U.S. Federal Trade Commission (FTC) reached a settlement with Equifax that required the company to pay $700 million after its now-infamous data breach. It's easy to see why losing customers after such issues could make such substantial fines even more painful for the companies that have to pay them. The FTC also investigated Facebook’s Cambridge Analytica scandal and handed the company a $5 billion fine for failing to adequately protect customer data — the largest imposed by the FTC. Problems also occur if companies misuse data. Take the example of a class-action lawsuit filed against AT&T. The telecom giant and a couple of data aggregation enterprises allegedly permitted third-party companies to access individuals' real-time locations via mobile phone data. Those companies didn't check first to see if the customers allowed such access. Such news could bring about irreparable reputational damage and make people hesitate to do business. Expecting customers to read privacy policies is not sufficient Companies rely on both back-end and customer-facing strategies to meet their data security goals and earn customer trust. Some businesses go beyond the norm by taking the time to publish sections on their websites that detail how their infrastructure supports data privacy. They discuss the implementation of things like multi-layered data access authorization framework, physical access controls for server rooms and data encryption at rest and in transit. But, one of the more prominent customer-facing declarations of a company’s commitment to keeping data secure is the privacy policy, now a fixture of modern websites. Companies cannot bypass publishing their privacy policies, of course. However, most people don't take the time to read those documents. An Axios/Survey Monkey poll spotlighted a disconnect between respondents' beliefs and actions. It found that although 87% of them felt it was either somewhat or very important to understand a company's privacy policy before signing up for something, 56% of them always or usually agree to it without reading it. More research on the subject by Varonis found that it can take nearly half an hour to read some privacy policies. That reading level got more advanced after the GDPR came into effect. Together, these studies illustrate that companies need to go beyond anticipating that customers will read what privacy policies say. Moreover, they should work hard to make them shorter and easier for people to understand. Most people want companies to take a stand for Data Privacy A study of 1,000 people conducted in the United Kingdom supported the earlier finding from Gemalto where people thought the companies holding their data were responsible for maintaining its security. It concluded that customers felt it was "highly important" for businesses to take a stand for information security and privacy, and that 53% expected firms to do so. Moreover, the results of a CIGI-Ipsos worldwide survey said that 53% of those polled were more concerned about online privacy now compared to a year ago. Additionally, 49% said their rising distrust of the internet made them provide less information online. Companies must show they care about data privacy and work that aspect into their business strategies. Otherwise, they could find that customers leave them in favor of more privacy-centric organizations. To get an idea of what can happen when companies have data privacy blunders, people only need to look at how Facebook users responded in the Cambridge Analytica aftermath. Statistics published by the Pew Research Center showed that 54% of adults changed their privacy settings in the past year, while approximately a quarter stopped using the site. After the news broke about Facebook and Cambridge Analytica, many media outlets reminded people that they could download all the data Facebook had about them. The Pew Research Center found that although only 9% of its respondents took that step, 47% of the people in that group removed the app from their phones. Data Privacy is a Top-of-Mind concern The studies and examples mentioned here strongly suggest consumers are no longer willing to accept the possible wrongful treatment of their data. They increasingly hold companies accountable and don't show forgiveness if they don't meet their privacy expectations. The most forward-thinking companies see this change and respond accordingly. Those that choose inaction instead are at risk of losing out. Individuals understand that companies value their data, but they aren't willing to part with it freely unless companies convey trustworthiness first. Author Bio Kayla Matthews writes about big data, cybersecurity, and technology. You can find her work on The Week, Information Age, KDnuggets and CloudTweaks, or over at ProductivityBytes.com. Facebook fails to block ECJ data security case from proceeding ICO to fine Marriott over $124 million for compromising 383 million users’ data. Facebook fined $2.3 million by Germany for providing incomplete information about hate speech content
Read more
  • 0
  • 0
  • 2787
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime
article-image-facebook-research-suggests-chatbots-and-conversational-ai-will-empathize-humans
Fatema Patrawala
06 Aug 2019
6 min read
Save for later

Facebook research suggests chatbots and conversational AI are on the verge of empathizing with humans

Fatema Patrawala
06 Aug 2019
6 min read
Last week, the Facebook AI research team published a progress report on dialogue research that is fundamentally building more engageable and personalized AI systems. According to the team, “Dialogue research is a crucial component of building the next generation of intelligent agents. While there’s been progress with chatbots in single-domain dialogue, agents today are far from capable of carrying an open-domain conversation across a multitude of topics. Agents that can chat with humans in the way that people talk to each other will be easier and more enjoyable to use in our day-to-day lives — going beyond simple tasks like playing a song or booking an appointment.” In their blog post, they have described new open source data sets, algorithms, and models that improve five common weaknesses of open-domain chatbots today. The weaknesses identified are maintaining consistency, specificity, empathy, knowledgeability, and multimodal understanding. Let us look at each one in detail: Dataset called Dialogue NLI introduced for maintaining consistency Inconsistencies are a common issue for chatbots partly because most models lack explicit long-term memory and semantic understanding. Facebook team in collaboration with their colleagues at NYU, developed a new way of framing consistency of dialogue agents as natural language inference (NLI) and created a new NLI data set called Dialogue NLI, used to improve and evaluate the consistency of dialogue models. The team showcased an example in the Dialogue NLI model, where in they considered two utterances in a dialogue as the premise and hypothesis, respectively. Each pair was labeled to indicate whether the premise entails, contradicts, or is neutral with respect to the hypothesis. Training an NLI model on this data set and using it to rerank the model’s responses to entail previous dialogues — or maintain consistency with them — improved the overall consistency of the dialogue agent. Across these tests they say they saw 3x lesser contradictions in the sentences. Several conversational attributes were studied to balance specificity As per the team, generative dialogue models frequently default to generic, safe responses, like “I don’t know” to some query which needs specific responses. Hence, the Facebook team in collaboration with Stanford’s AI researcher Abigail See, studied how to fix this by controlling several conversational attributes, like the level of specificity. In one experiment, they conditioned a bot on character information and asked “What do you do for a living?” A typical chatbot responds with the generic statement “I’m a construction worker.” With control methods, the chatbots proposed more specific and engaging responses, like “I build antique homes and refurbish houses." In addition to specificity, the team mentioned, “that balancing question-asking and answering and controlling how repetitive our models are make significant differences. The better the overall conversation flow, the more engaging and personable the chatbots and dialogue agents of the future will be.” Chatbot’s ability to display empathy while responding was measured The team worked with researchers from the University of Washington to introduce the first benchmark task of human-written empathetic dialogues centered on specific emotional labels to measure a chatbot’s ability to display empathy. In addition to improving on automatic metrics, the team showed that using this data for both fine-tuning and as retrieval candidates leads to responses that are evaluated by humans as more empathetic, with an average improvement of 0.95 points (on a 1-to-5 scale) across three different retrieval and generative models. The next challenge for the team is that empathy-focused models should perform well in complex dialogue situations, where agents may require balancing empathy with staying on topic or providing information. Wikipedia dataset used to make dialogue models more knowledgeable The research team has improved dialogue models’ capability of demonstrating knowledge by collecting a data set with conversations from Wikipedia, and creating new model architectures that retrieve knowledge, read it, and condition responses on it. This generative model has yielded the most pronounced improvement and it is rated by humans as 26% more engaging than their knowledgeless counterparts. To engage with images, personality based captions were used To engage with humans, agents should not only comprehend dialogue but also understand images. In this research, the team focused on image captioning that is engaging for humans by incorporating personality. They collected a data set of human comments grounded in images, and trained models capable of discussing images with given personalities, which makes the system interesting for humans to talk to. 64% humans preferred these personality-based captions over traditional captions. To build strong models, the team considered both retrieval and generative variants, and leveraged modules from both the vision and language domains. They defined a powerful retrieval architecture, named TransResNet, that works by projecting the image, personality, and caption in the same space using image, personality, and text encoders. The team showed that their system was able to produce captions that are close to matching human performance in terms of engagement and relevance. And annotators preferred their retrieval model’s captions over captions written by people 49.5% of the time. Apart from this, Facebook team has released a new data collection and model evaluation tool, a Messenger-based Chatbot game called Beat the Bot, that allows people to interact directly with bots and other humans in real time, creating rich examples to help train models. To conclude, the Facebook AI team mentions, “Our research has shown that it is possible to train models to improve on some of the most common weaknesses of chatbots today. Over time, we’ll work toward bringing these subtasks together into one unified intelligent agent by narrowing and eventually closing the gap with human performance. In the future, intelligent chatbots will be capable of open-domain dialogue in a way that’s personable, consistent, empathetic, and engaging.” On Hacker News, this research has gained positive and negative reviews. Some of them discuss that if AI will converse like humans, it will do a lot of bad. While other users say that this is an impressive improvement in the field of conversational AI. A user comment reads, “I gotta say, when AI is able to converse like humans, a lot of bad stuff will happen. People are so used to the other conversation partner having self-interest, empathy, being reasonable. When enough bots all have a “swarm” program to move conversations in a particular direction, they will overwhelm any public conversation. Moreover, in individual conversations, you won’t be able to trust anything anyone says or negotiates. Just like playing chess or poker online now. And with deepfakes, you won’t be able to trust audio or video either. The ultimate shock will come when software can render deepfakes in realtime to carry on a conversation, as your friend but not. As a politician who “said crazy stuff” but really didn’t, but it’s in the realm of believability. I would give it about 20 years until it all goes to shit. If you thought fake news was bad, realtime deepfakes and AI conversations with “friends” will be worse.  Scroll Snapping and other cool CSS features come to Firefox 68 Google Chrome to simplify URLs by hiding special-case subdomains Lyft releases an autonomous driving dataset “Level 5” and sponsors research competition
Read more
  • 0
  • 0
  • 3282

article-image-why-are-experts-worried-about-microsofts-billion-dollar-bet-in-openais-agi-pipe-dream
Sugandha Lahoti
23 Jul 2019
6 min read
Save for later

Why are experts worried about Microsoft's billion dollar bet in OpenAI's AGI pipe dream?

Sugandha Lahoti
23 Jul 2019
6 min read
Microsoft has invested $1 billion in OpenAI with the goal of building next-generation supercomputers and a platform within Microsoft Azure which will scale to AGI (Artificial General Intelligence). This is a multiyear partnership with Microsoft becoming OpenAI’s preferred partner for commercializing new AI technologies. Open AI will become a big Azure customer, porting its services to run on Microsoft Azure. The $1 billion is a cash investment into OpenAI LP, which is Open AI’s for-profit corporate subsidiary. The investment will follow a standard capital commitment structure which means OpenAI can call for it, as they need it. But the company plans to spend it in less than five years. Per the official press release, “The companies will focus on building a computational platform in Azure for training and running advanced AI models, including hardware technologies that build on Microsoft’s supercomputing technology. These will be implemented in a safe, secure and trustworthy way and is a critical reason the companies chose to partner together.” They intend to license some of their pre-AGI technologies, with Microsoft becoming their preferred partner. “My goal in running OpenAI is to successfully create broadly beneficial A.G.I.,” Sam Altman, who co-founded Open AI with Elon Musk, said in a recent interview. “And this partnership is the most important milestone so far on that path.” Musk left the company in February 2019, to focus on Tesla and because he didn’t agree with some of what OpenAI team wanted to do. What does this partnership mean for Microsoft and Open AI OpenAI may benefit from this deal by keeping their innovations private which may help commercialization, raise more funds and get to AGI faster. For OpenAI this means the availability of resources for AGI, while potentially allowing founders and other investors with the opportunity to either double-down on OpenAI or reallocate resources to other initiatives However, this may also lead to them not disclosing progress, papers with details, and open source code as much as in the past. https://twitter.com/Pinboard/status/1153380118582054912 As for Microsoft, this deal is another attempt in quietly taking over open source. First, with the acquisition of GitHub and the subsequent launch of GitHub Sponsors, and now with becoming OpenAI’s ‘preferred partner’ for commercialization. Last year at an Investor conference, Nadella said, “AI is going to be one of the trends that is going to be the next big shift in technology. It's going to be AI at the edge, AI in the cloud, AI as part of SaaS applications, AI as part of in fact even infrastructure. And to me, to be the leader in it, it's not enough just to sort of have AI capability that we can exercise—you also need the ability to democratize it so that every business can truly benefit from it. That to me is our identity around AI.” Partnership with OpenAI seems to be a part of this plan. This deal can also possibly help Azure catch up with Google and Amazon both in hardware scalability and Artificial Intelligence offerings. A hacker news user comments, “OpenAI will adopt and make Azure their preferred platform. And Microsoft and Azure will jointly "develop new Azure AI supercomputing technologies", which I assume is advancing their FGPA-based deep learning offering. Google has a lead with TensorFlow + TPUs and this is a move to "buy their way in", which is a very Microsoft thing to do.” https://twitter.com/soumithchintala/status/1153308199610511360 It is also likely that Microsoft is investing money which will eventually be pumped back into its own company, as OpenAI buys computing power from the tech giant. Under the terms of the contract, Microsoft will eventually become the sole cloud computing provider for OpenAI, and most of that $1 billion will be spent on computing power, Altman says. OpenAI, who were previously into building ethical AI will now pivot to build cutting edge AI and move towards AGI. Sometimes even neglecting ethical ramifications, wanting to deploy tech at the earliest which is what Microsoft would be interested in monetizing. https://twitter.com/CadeMetz/status/1153291410994532352 I see two primary motivations: For OpenAI—to secure funding and to gain some control over hardware which in turn helps differentiate software. For MSFT—to elevate Azure in the minds of developers for AI training. - James Wang, Analyst at ARKInvest https://twitter.com/jwangARK/status/1153338174871154689 However, the news of this investment did not go down well with some experts in the field who saw this as a pure commercial deal and questioned whether OpenAI’s switch to for-profit research undermines its claims to be “democratizing” AI. https://twitter.com/fchollet/status/1153489165595504640 “I can't really parse its conversion into an LP—and Microsoft's huge investment—as anything but a victory for capital” - Robin Sloan, Author https://twitter.com/robinsloan/status/1153346647339876352 “What is OpenAI? I don't know anymore.” - Stephen Merity, Deep learning researcher https://twitter.com/Smerity/status/1153364705777311745 https://twitter.com/SamNazarius/status/1153290666413383682 People are also speculating whether creating AGI is really even possible. In a recent survey experts estimated that there was a 50 percent chance of creating AGI by the year 2099. Pet New York Times, most experts believe A.G.I. will not arrive for decades or even centuries. Even Altman admits OpenAI may never get there. But the race is on nonetheless. Then why is Microsoft delivering the $1 billion over five years considering that is neither enough money nor enough time to produce AGI. Although, OpenAI has certainly impressed the tech community with its AI innovations. In April, OpenAI’s new algorithm that is trained to play the complex strategy game, Dota 2, beat the world champion e-sports team OG at an event in San Francisco, winning the first two matches of the ‘best-of-three’ series. The competition included a human team of five professional Dota 2 players and AI team of five OpenAI bots. In February, they released a new AI model GPT-2, capable of generating coherent paragraphs of text without needing any task-specific training. However experts felt that the move signalled towards ‘closed AI’ and propagated the ‘fear of AI’ for its ability to write convincing fake news from just a few words. Github Sponsors: Could corporate strategy eat FOSS culture for dinner? Microsoft is seeking membership to Linux-distros mailing list for early access to security vulnerabilities OpenAI: Two new versions and the output dataset of GPT-2 out!
Read more
  • 0
  • 0
  • 2420

article-image-why-intel-is-betting-on-bfloat16-to-be-a-game-changer-for-deep-learning-training-hint-range-trumps-precision
Vincy Davis
22 Jul 2019
4 min read
Save for later

Why Intel is betting on BFLOAT16 to be a game changer for deep learning training? Hint: Range trumps Precision.

Vincy Davis
22 Jul 2019
4 min read
A group of researchers from Intel Labs and Facebook have published a paper titled, “A Study of BFLOAT16 for Deep Learning Training”. The paper presents a comprehensive study indicating the success of Brain Floating Point (BFLOAT16) half-precision format in Deep Learning training across image classification, speech recognition, language modeling, generative networks and industrial recommendation systems. BFLOAT16 has a 7-bit mantissa and an 8-bit exponent, similar to FP32, but with less precision. BFLOAT16 was originally developed by Google and implemented in its third generation Tensor Processing Unit (TPU). https://twitter.com/JeffDean/status/1134524217762951168 Many state of the art training platforms use IEEE-754 or automatic mixed precision as their preferred numeric format for deep learning training. However, these formats lack in representing error gradients during back propagation. Thus, they are not able to satisfy the required  performance gains. BFLOAT16 exhibits a dynamic range which can be used to represent error gradients during back propagation. This enables easier migration of deep learning workloads to BFLOAT16 hardware. Image Source: BFLOAT16 In the above table, all the values are represented as trimmed full precision floating point values with 8 bits of mantissa with their dynamic range comparable to FP32. By adopting to BFLOAT16 numeric format, the core compute primitives such as Fused Multiply Add (FMA) can be built using 8-bit multipliers. This leads to significant reduction in area and power while preserving the full dynamic range of FP32. How Deep neural network(DNNs) is trained with BFLOAT16? The below figure shows the mixed precision data flow used to train deep neural networks using BFLOAT16 numeric format. Image Source: BFLOAT16 The BFLOAT16 tensors are taken as input to the core compute kernels represented as General Matrix Multiply (GEMM) operations. It is then forwarded to the FP32 tensors as output.   The researchers have developed a library called Quantlib, represented as Q in the figure, to implement the emulation in multiple deep learning frameworks. One of the functions of a Quantlib is to modify the elements of an input FP32 tensor to echo the behavior of BFLOAT16. Quantlib is also used to modify a copy of the FP32 weights to BFLOAT16 for the forward pass.   The non-GEMM computations include batch-normalization and activation functions. The  FP32 always maintains the bias tensors.The FP32 copy of the weights updates the step uses to maintain model accuracy. How does BFLOAT16 perform compared to FP32? Convolution Neural Networks Convolutional neural networks (CNN) are primarily used for computer vision applications such as image classification, object detection and semantic segmentation. AlexNet and ResNet-50 are used as the two representative models for the BFLOAT16 evaluation. AlexNet demonstrates that BFLOAT16 emulation follows very near to the actual FP32 run and achieves 57.2% top-1 and 80.1% top-5 accuracy. Whereas in ResNet-50, the BFLOAT16 emulation follows the FP32 baseline almost exactly and achieves the same top-1 and top-5 accuracy. Image Source: BFLOAT16 Similarly, the researchers were able to successfully demonstrate that BFLOAT16 is able to represent tensor values across many application domains including Recurrent Neural Networks, Generative Adversarial Networks (GANs) and Industrial Scale Recommendation System. The researchers thus established that the dynamic range of BFLOAT16 is of the same range as that of FP32 and its conversion to/from FP32 is also easy. It is important to maintain the same range as FP32 since no hyper-parameter tuning is required for convergence in FP32. A hyperparameter is a parameter of choosing a set of optimal hyperparameters in machine learning for a learning algorithm. Researchers of this paper expect to see an industry-wide adoption of BFLOAT16 across emerging domains. Recent reports suggest that Intel is planning to graft Google’s BFLOAT16 onto its processors  as well as on its initial Nervana Neural Network Processor for training, the NNP-T 1000. Pradeep Dubey, who directs the Parallel Computing Lab at Intel and is also one of the researchers of this paper believes that for deep learning, the range of the processor is more important than the precision, which is the inverse of the rationale used for IEEE’s floating point formats. Users are finding it interesting that a BFLOAT16 half-precision format is suitable for deep learning applications. https://twitter.com/kevlindev/status/1152984689268781056 https://twitter.com/IAmMattGreen/status/1152769690621448192 For more details, head over to the “A Study of BFLOAT16 for Deep Learning Training” paper. Intel’s new brain inspired neuromorphic AI chip contains 8 million neurons, processes data 1K times faster Google plans to remove XSS Auditor used for detecting XSS vulnerabilities from its Chrome web browser IntelliJ IDEA 2019.2 Beta 2 released with new Services tool window and profiling tools
Read more
  • 0
  • 0
  • 6062

article-image-techwontbuildit-entropic-maintainer-calls-for-a-ban-on-palantir-employees-contributing-to-the-project-and-asks-other-open-source-communities-to-take-a-stand-on-ethical-grounds
Sugandha Lahoti
19 Jul 2019
6 min read
Save for later

#TechWontBuildIt: Entropic maintainer calls for a ban on Palantir employees contributing to the project and asks other open source communities to take a stand on ethical grounds

Sugandha Lahoti
19 Jul 2019
6 min read
The tech industry is being plagued by moral and ethical issues as top players are increasingly becoming explicit about prioritizing profits over people or planet. Recent times are rift with cases of tech companies actively selling facial recognition technology to law enforcement agencies, helping ICE separate immigrant families, taking large contracts with the Department of Defense, accelerating the extraction of fossil fuels, deployment of surveillance technology. As the US gets alarmingly dangerous for minority groups, asylum seekers and other vulnerable communities, it has awakened the tech worker community to organize for keeping their employers in check. They have been grouping together to push back against ethically questionable decisions made by their employers using the hashtag #TechWontBuildIt since 2018. Most recently, several open source communities, activists and developers have strongly demonstrated against Palantir for their involvement with ICE. Palantir, a data analytics company, founded by Peter Thiel, one of President Trump’s most vocal supporters in Silicon Valley, has been called out for its association with the Immigration and Customs Enforcement (ICE). According to emails obtained by WNYC, Palantir’s mobile app FALCON is being used by ICE to carry out raids on immigrant communities as well as enable workplace raids. According to the emails, an ICE supervisor sent an email to his officers before a planned spate of raids in New York City in 2017. The emails ordered them to use a Palantir program, called FALCON mobile, for the operation. The email was sent in preparation for a worksite enforcement briefing on January 8, 2018. Two days later, ICE raided nearly a hundred 7-Elevens across U.S. According to WNYC, ICE workplace raids led to 1,525 arrests over immigration status from October 2017 to October 2018. The email reads, “[REDACTION] we want all the team leaders to utilize the FALCON mobile app on your GOV iPhones, We will be using the FALCON mobile app to share info with the command center about the subjects encountered in the stores as well as team locations." Other emails obtained by WYNC detail a Palantir staffer notifying an ICE agent to test out their FALCON mobile application because of his or her “possible involvement in an upcoming operation.” Another message, in April 2017, shows a Palantir support representative instructing an agent on how to classify a datapoint, so that Palantir’s Investigative Case Management [ICM] platform could properly ingest records of a cell phone seizure. In December 2018, Palantir told the New York Times‘ Dealbook that Palantir technology is not used by the division of ICE responsible for carrying out the deportation and detention of undocumented immigrants. Palantir declined WNYC’s requests for comment. Citing law enforcement “sensitivities,” ICE also declined to comment on how it uses Palantir during worksite enforcement operations. In May this year, new documents released by Mijente, an advocacy organization, revealed that Palantir was responsible for 2017 operation that targeted and arrested family members of children crossing the border alone. The documents show a huge contrast to what Palantir said its software was doing. As part of the operation, ICE arrested 443 people solely for being undocumented. Mijente has then urged Palantir to drop its contract with ICE and stop providing software to agencies that aid in tracking, detaining, and deporting migrants, refugees, and asylum seekers. Open source communities, activists and developers strongly oppose Palantir Post the revelation of Palantir’s involvement with ICE, several open-source developers are strongly opposing Palantir. The Entropic project, a JS package registry, is debating the idea of banning Palantir employees from participating in the project. Kat Marchán, Entropic maintainer posted on the forum, “I find it unconscionable for tech folks to be building the technological foundations for this deeply unethical and immoral (and fascist) practice, and I would like it if we, in our limited power as a community to actually affect the situation, officially banned any Palantir employees from participating in or receiving any sort of direct support from the Entropic community.” She has further proposed explicitly banning Palantir employees from the Discourse, the Discord, as well as the GitHub communities and any other forums, Entropic may use for coordinating the project. https://twitter.com/maybekatz/status/1151355320314187776 Amazon is also facing renewed calls from employees and external immigration advocates to stop working with Palantir. According to an internal email obtained by Forbes, Amazon employees are recirculating a June 2018 letter to executives calling for Palantir to be kicked off Amazon Web Services. More than 500 Amazon employees have signed the letter addressed to CEO Jeff Bezos and AWS head Andy Jassy. Not just that, pro-immigration organizations such as Mijente and Jews for Racial and Economic Justice, interrupted the keynote speech at Amazon’s annual AWS Summit, last Thursday. https://twitter.com/altochulo/status/1149326296092164097 More than a dozen groups of activists also protested on July 12 against Palantir Technologies in Palo Alto for the company’s provision of software facilitating ICE raids, detentions, and deportations. City residents also joined the protests expanding the total to hundreds. Back in August 2018, the Lerna team had taken a strong stand against ICE by modifying their MIT license to ban companies who have collaborated with ICE from using Lerna. The updated license banned companies that are known collaborators with ICE such as Microsoft, Palantir, and Amazon, among the others from using Lerna. To quote Meredith Whittaker, Google walkout organizer who recently left the company, from her farewell letter, “Tech workers have emerged as a force capable of making real change, pushing for public accountability, oversight, and meaningful equity. And this right when the world needs it most” She further adds, “The stakes are extremely high. The use of AI for social control and oppression is already emerging, even in the face of developers’ best of intentions. We have a short window in which to act, to build in real guardrails for these systems before AI is built into our infrastructure and it’s too late.” Extraordinary times call for extraordinary measures. As the tech industry grapples with the consequences of its hypergrowth technosolutionist mindset, where do tech workers draw the line? Can tech workers afford to be apolitical or separate their values from the work they do? There are no simple answers, but one thing is for sure - the questions must be asked and faced. Open source, as part of the commons, has a key role to play and how it evolves in the next couple of years is likely to define the direction the world would take. Lerna relicenses to ban major tech giants like Amazon, Microsoft, Palantir from using its software as a protest against ICE Palantir’s software was used to separate families in a 2017 operation reveals Mijente ACLU files lawsuit against 11 federal criminal and immigration enforcement agencies for disclosure of information on government hacking.
Read more
  • 0
  • 0
  • 3487
article-image-how-bad-is-the-gender-diversity-crisis-in-ai-research-study-analysing-1-5million-arxiv-papers-says-its-serious
Fatema Patrawala
18 Jul 2019
9 min read
Save for later

How bad is the gender diversity crisis in AI research? Study analysing 1.5million arxiv papers says it’s “serious”

Fatema Patrawala
18 Jul 2019
9 min read
Yesterday the team at Nesta organization, an innovation firm based out of UK published a research on gender diversity in the AI research workforce. The authors of this research are Juan Mateos Garcis, the Director, Konstantinos Stathoulopoulos, the Principal Researcher and Hannah Owen, the Programme Coordinator at Nesta. https://twitter.com/JMateosGarcia/status/1151517641103872006 They have prepared an analysis purely based on 1.5 million arxiv papers. The team claims that it is the first ever study of gender diversity in AI which is not on any convenience sampling or proprietary database. The team posted on its official blog post, “We conducted a large-scale analysis of gender diversity in AI research using publications from arXiv, a repository with more than 1.5 million preprints widely used by the AI community. We aim to expand the evidence base on gender diversity in AI research and create a baseline with which to interrogate the impact of current and future policies and interventions.  To achieve this, we enriched the ArXiv data with geographical, discipline and gender information in order to study the evolution of gender diversity in various disciplines, countries and institutions as well as examine the semantic differences between AI papers with and without female co-authors.” With this research the team also aims to bring prominent female figures they have identified under the spotlight. Key findings from the research Serious gender diversity crisis in AI research The team found a severe gender diversity gap in AI research with only 13.83% of authors being women. Moreover, in relative terms, the proportion of AI papers co-authored by at least one woman has not improved since the 1990s. Juan Mateos thinks this kind of crisis is a waste of talent and it increases the risk of discriminatory AI systems. https://twitter.com/JMateosGarcia/status/1151517642236276736 Location and research domain are significant drivers of gender diversity Women in the Netherlands, Norway and Denmark are more likely to publish AI papers while those in Japan and Singapore are less likely. In the UK, 26.62% of the AI papers have at least one female co-author, placing the country at the 22nd spot worldwide. The US follows the UK in terms of having at least one female co-authors at 25% and for the unique female author US leads one position above UK. Source: Nesta research report Regarding the research domains, women working in Physics and Education, Computer Ethics and other societal issues and Biology are more likely to publish their work on AI in comparison to those working in Computer Science or Mathematics. Source: Nesta research report Significant gender diversity gap in universities, big tech companies and other research institutions Apart from the University of Washington, every other academic institution and organisation in the dataset has less than 25% female AI researchers. Regarding some of the big tech, only 11.3% of Google’s employees who have published their AI research on arXiv are women, while the proportion is similar for Microsoft (11.95%) and is slightly better for IBM (15.66%). Important semantic differences between AI paper with and without a female co-author When examining the publications in the Machine Learning and Societal topics in the UK in 2012 and 2015, papers involving at least one female co-author tend to be more semantically similar to each other than with those without any female authors. Moreover, papers with at least one female co-author tend to be more applied and socially aware, with terms such as fairness, human mobility, mental, health, gender and personality being among the most salient ones. Juan Mateos noted that this is an area which deserves further research. https://twitter.com/JMateosGarcia/status/1151517647361781760   The top 15 women with the most AI publications on arXiv identified Aarti Singh, Associate Professor at the Machine learning department of Carnegie Mellon University Cordelia Schmid, is a part of Google AI team and holds a permanent research position at Inria Grenoble Rhone-Alpes Cynthia Rudin, an associate professor of computer science, electrical and computer engineering, statistical science and mathematics at Duke University Devi Parikh, an Assistant Professor in the School of Interactive Computing at Georgia Tech Karen Livescu, an Associate Professor at Toyota Technical Institute at Chicago Kate Saenko,  an Associate Professor at the Department of Computer at Boston University Kristina Lerman, a Project Leader at the Information Sciences Institute at the University of Southern California Marilyn A. Walker, a Professor at the Department of Computer Science at the University of California Mihaela van der Schaar, is John Humphrey Plummer Professor of Machine Learning, Artificial Intelligence and Medicine at the University of Cambridge and a Turing Fellow at The Alan Turing Institute in London Petia Radeva, a professor at the Department of Mathematics and Computer Science, Faculty of Mathematics and Computer Science at the Universitat de Barcelona Regina Barzilay is a professor at the Massachusetts Institute of Technology and a member of the MIT Computer Science and Artificial Intelligence Laboratory Svetha Venkatesh, an ARC Australian Laureate Fellow, Alfred Deakin Professor and Director of the Centre for Pattern Recognition and Data Analytics (PRaDA) at Deakin University Xiaodan Liang, an Associate Professor at the School of Intelligent Systems Engineering, Sun Yat-sen University Yonina C. Elda, a Professor of Electrical Engineering, Weizmann Faculty of Mathematics and Computer Science at the University of Israel Zeynep Akata, an Assistant Professor with the University of Amsterdam in the Netherlands There are 5 other women researchers who were not identified in the study. Interviews bites from few women contributors and institutions The research team also interviewed few researchers and institutions identified in their work and they think a system wide reform is needed. When the team discussed the findings with the most cited female researcher Mihaela Van Der Schaar, she did feel that her presence in the field has only started to be recognised, having begun her career in 2003, ‘I think that part of the reason for this is because I am a woman, and the experience of (the few) other women in AI in the same period has been similar.’ she says. Professor Van Der Schaar also described herself and many of her female colleagues as ‘faceless’, she suggested that the work of celebrating leading women in the field could have a positive impact on the representation of women, as well as the disparity in the recognition that these women receive. This suggests that work is needed across the pipeline, not just with early-stage invention in education, but support for those women in the field. She also highlighted the importance of open discussion about the challenges women face in the AI sector and that workplace changes such as flexible hours are needed to enable researchers to participate in a fast-paced sector without sacrificing their family life. The team further discussed the findings with the University of Washington’s Eve Riskin, Associate Dean of Diversity and Access in the College of Engineering. Riskin described that much of her female faculty experienced a ‘toxic environment’ and pervasive imposter syndrome. She also emphasized the fact that more research is needed in terms of the career trajectories of the male and female researchers including the recruitment and retention. Some recent examples of exceptional women in AI research and their contribution While these women talk about the diversity gaps in this field recently we have seen works from female researchers like Katie Bouman which gained significant attention. Katie is a post-doctoral fellow at MIT whose algorithm led to an image of a supermassive black hole. But then all the attention became a catalyst for a sexist backlash on social media and YouTube. It set off “what can only be described as a sexist scavenger hunt,” as The Verge described it, in which an apparently small group of vociferous men questioned Bouman’s role in the project. “People began going over her work to see how much she’d really contributed to the project that skyrocketed her to unasked-for fame.” Another incredible example in the field of AI research and ethics is of Meredith Whittaker, an ex-Googler, now a program manager, activist, and co-founder of the AI Now Institute at New York University. Meredith is committed to the AI Now Institute, her AI ethics work, and to organize an accountable tech industry. On Tuesday,  Meredith left Google after facing retaliation from company for organizing last year’s protest of Google Walkout for Real Change demanding the company for structural changes to ensure a safe and conducive work environment for everyone.. Other observations from the research and next steps The research also highlights the fact that women are as capable as men in contributing to technical topics while they tend to contribute more than men to publications with a societal or ethical output. Some of the leading AI researchers in the field shared their opinion on this: Petia Radeva, Professor at the Department of Mathematics and Computer Science at the University of Barcelona, was positive that the increasingly broad domains of application for AI and the potential impact of this technology will attract more women into the sector. Similarly, Van Der Schaar suggests that “publicising the interdisciplinary scope of possibilities and career paths that studying AI can lead to will help to inspire a more diverse group of people to pursue it. In parallel, the industry will benefit from a pipeline of people who are motivated by combining a variety of ideas and applying them across domains.” The research team in future will explore the temporal co-authorship network of AI papers to examine how different the career trajectory of male and female researchers might be. They will survey AI researchers on arXiv and investigate the drivers of the diversity gap in more detail through their innovation mapping methods. They also plan to extend this analysis to identify the representation of other underrepresented groups. Meredith Whittaker, Google Walkout organizer, and AI ethics researcher is leaving the company, adding to its brain-drain woes over ethical concerns “I’m concerned about Libra’s model for decentralization”, says co-founder of Chainspace, Facebook’s blockchain acquisition DeepMind’s Alphastar AI agent will soon anonymously play with European StarCraft II players
Read more
  • 0
  • 0
  • 2655

article-image-elon-musks-neuralink-unveils-a-sewing-machine-like-robot-to-control-computers-via-the-brain
Sugandha Lahoti
17 Jul 2019
8 min read
Save for later

Elon Musk's Neuralink unveils a “sewing machine-like” robot to control computers via the brain

Sugandha Lahoti
17 Jul 2019
8 min read
After two years of being super-secretive about their work, Neuralink, Elon’s Musk’s neurotechnology company, has finally presented their progress in brain-computer interface technology. The Livestream which was uploaded on YouTube showcases a “sewing machine-like” robot that can implant ultrathin threads deep into the brain giving people the ability to control computers and smartphones using their thoughts. For its brain-computer interface tech, the company has received $158 million in funding and has 90 employees. Note: All images are taken from Neuralink Livestream video unless stated otherwise. Elon Musk opened the presentation talking about the primary aim of Neuralink which is to use brain-computer interface tech to understand and treat brain disorders, preserve and enhance the brain, and ultimately and this may sound weird, “achieve a symbiosis with artificial intelligence”. He added, “This is not a mandatory thing. It is a thing you can choose to have if you want. This is something that I think will be really important on a civilization-level scale.” Neuralink wants to build, record from and selectively stimulate as many neurons as possible across diverse brain areas. They have three goals: Increase by orders of magnitude, the number of neurons you can read from and write to in safe, long-lasting ways. At each stage, produce devices that serve critical unmet medical needs of patients. Make inserting a computer connection into your brain as safe and painless as LASIK eye surgery. The robot that they have built was designed to be completely wireless, with a  practical bandwidth that is usable at home and lasts for a long time. Their system has an N1 sensor, which is an 8mm wide, 4mm tall cylinder having 1024 electrodes. It consists of a thin film, which has threads. The threads are placed using thin needles, into the brain by a robotic system in a manner akin to a sewing machine avoiding blood vessels. The robot peels off the threads one by one from the N1 Sensor and places it in the brain. A needle would grab each thread by a small loop and then is inserted into the brain by the robot. The robot is under the supervision of a human neurosurgeon who lays out where the threads are placed. The actual needle which the robot uses is 24 microns. The process puts a 2mm incision near the human ear, which is dilated to 8mm. The threads A robot implants threads using a needle For the first patients, the Neuralink team is looking at four sensors which will be connected via very small wires under the scalp to an inductive coil behind the ear. This is encased in a wearable device that they call the ‘Link’ which contains a Bluetooth radio and a battery. They will be controlled through an iPhone app. Source: NYT Neuralink/MetaLab iPhone app The goal is to drill four 8mm holes into paralyzed patients’ skulls and insert implants that will give them the ability to control computers and smartphones using their thoughts. For the first product, they are focusing on giving patients the ability to control their mobile device, and then redirect the output from their phone to a keyboard or a mouse. The company will seek U.S. Food and Drug Administration approval and is aspiring to target first-in-human clinical study by 2020. They will use it for treating upper cervical spinal cord injury. They’re expecting those patients to get four 1024 channel sensors, one each in the primary motor cortex, supplementary motor area, premotor cortex and closed-loop feedback into the primary somatosensory cortex. As reported by Bloomberg who got a pre-media briefing, Neuralink said it has performed at least 19 surgeries on animals with its robots and successfully placed the wires, which it calls “threads,” about 87% of the time. They used a lab rat and implanted a USB-C port in its head. A wire attached to the port transmitted its thoughts to a nearby computer where a software recorded and analyzed its brain activity, measuring the strength of brain spikes. The amount of data being gathered from a lab rat was about 10 times greater than what today’s most powerful sensors can collect. The flexibility of the Neuralink threads would be an advance, said Terry Sejnowski, the Francis Crick Professor at the Salk Institute for Biological Studies, in La Jolla, Calif to the New York Times. However, he noted that the Neuralink researchers still needed to prove that the insulation of their threads could survive for long periods in a brain’s environment, which has a salt solution that deteriorates many plastics. Musk's bizarre attempts to revolutionalize the world are far from reality Elon Musk is known for his dramatic promises and showmanship as much as he is for his eccentric projects. But how far they are grounded in reality is another thing. In May he successfully launched his mammoth space mission, Starlink sending 60 communications satellites to the orbit which will eventually be part of a single constellation providing high-speed internet to the globe. However, the satellites were launched after postponing it two times to “update satellite software”. Not just that,  three of the 60 satellites have lost contact with ground control teams, a SpaceX spokesperson said on June 28. Experts are already worried about how the Starlink constellation will contribute to the space debris problem. Currently, there are 2,000 operational satellites in orbit around Earth, according to the latest figures from the European Space Agency, and the completed Starlink constellation will drastically add to that number. Observers had also noticed some Starlink satellites had not initiated orbit raising after being released. Musk’s much-anticipated Hyperloop (first publicly mentioned in 2012) was supposed to shuttle passengers at near-supersonic speeds via pods traveling in a long, underground tunnel. But it was soon reduced to a car in a very small tunnel. When they unveiled the underground tunnel to the media in California last year in December, reporters climbed into electric cars made by Musk’s Tesla and were treated to a 40 mph ride along a bumpy path. Here as well there have been public concerns regarding its impact on public infrastructure and the environment. The biggest questions surrounding hyperloop’s environmental impact are its effect on carbon dioxide emissions, the effect of infrastructure on ecosystems, and the environmental footprint of the materials used to build it. Other concerns include noise pollution and how to repurpose hyperloop tubes and tunnels at the end of their lifespan. Researchers from Tencent Keen Security Lab criticized Tesla’s self-driving car software, publishing a report detailing their successful attacks on Tesla firmware. It includes remote control over the steering and an adversarial example attack on the autopilot that confuses the car into driving into oncoming traffic lane. Musk had also made promises to have a fully self-driving car for Tesla by 2020 which caused a lot of activity in the stock markets. But most are skeptical about this claim as well. Whether Elon Musk’s AI symbiotic visions will come in existence in the foreseeable future is questionable. Neuralink's long-term goals are characteristically unrealistic, considering not much is known about the human brain; cognitive functions and their representation as brain signals are still an area where much further research is required. While Musk’s projects are known for their technical excellence, History shows a lack of thought into the broader consequences and cost of such innovations such as the ethical concerns, environmental and societal impacts. Neuralink’s implant is also prone to invading one’s privacy as it will be storing sensitive medical information of a patient. There is also the likelihood of it violating one’s constitutional rights such as freedom of speech, expression among others. What does it mean to live in a world where one’s thoughts are constantly monitored and not truly one’s own? Then, because this is an implant what if the electrodes malfunction and send wrong signals to the brain. Who will be accountable in such scenarios? Although the FDA will be probing into such questions, these are some questions any responsible company should ask of itself proactively while developing life-altering products or services. These are equally important aspects that are worthy of stage time in a product launch. Regardless, Musk’s bold claims and dramatic representations are sure to gain the attention of investors and enthusiasts for now. Elon Musk reveals big plans with Neuralink SpaceX shares new information on Starlink after the successful launch of 60 satellites What Elon Musk can teach us about Futurism & Technology Forecasting
Read more
  • 0
  • 0
  • 4535

article-image-amazons-partnership-with-nhs-to-make-alexa-offer-medical-advice-raises-privacy-concerns-and-public-backlash
Bhagyashree R
12 Jul 2019
6 min read
Save for later

Amazon’s partnership with NHS to make Alexa offer medical advice raises privacy concerns and public backlash

Bhagyashree R
12 Jul 2019
6 min read
Virtual assistants like Alexa and smart speakers are being increasingly used in today’s time because of the convenience they come packaged with. It is good to have someone play a song or restock your groceries just on your one command, or probably more than one command. You get the point! But, how comfortable will you be if these assistants can provide you some medical advice? Amazon has teamed up with UK’s National Health Service (NHS) to make Alexa your new medical consultant. The voice-enabled digital assistant will now answer your health-related queries by looking through the NHS website vetted by professional doctors. https://twitter.com/NHSX/status/1148890337504583680 The NHSX initiative to drive digital innovation in healthcare Voice search definitely gives us the most “humanized” way of finding information from the web. One of the striking advantages of voice-enabled digital assistants is that the elderly, the blind and those who are unable to access the internet in other ways can also benefit from them. UK’s health secretary, Matt Hancock, believes that “embracing” such technologies will not only reduce the pressure General Practitioners (GPs) and pharmacists face but will also encourage people to take better control of their health care. He adds, "We want to empower every patient to take better control of their healthcare." Partnering with Amazon is just one of many steps by NHS to adopt technology for healthcare. The NHS launched a full-fledged unit named NHSX (where X stands for User Experience) last week. Its mission is to provide staff and citizens “the technology they need” with an annual investment of more than $1 billion a year. This partnership was announced last year and NHS plans to partner with other companies such as Microsoft in the future to achieve its goal of “modernizing health services.” Can we consider Alexa’s advice safe Voice assistants are very fun and convenient to use, but only when they are actually working. Many a time it happens that the assistant fails to understand something and we have to yell the command again and again, which makes the experience outright frustrating. Furthermore, the track record of consulting the web to diagnose our symptoms has not been the most accurate one. Many Twitter users trolled this decision saying that Alexa is not yet capable of doing simple tasks like playing a song accurately and the NHS budget could have been instead used on additional NHS staff, lowering drug prices, and many other facilities. The public was also left sore because the government has given Amazon a new means to make a profit, instead of forcing them to pay taxes. Others also talked about the times when Google (mis)-diagnosed their symptoms. https://twitter.com/NHSMillion/status/1148883285952610304 https://twitter.com/doctor_oxford/status/1148857265946079232 https://twitter.com/TechnicallyRon/status/1148862592254906370 https://twitter.com/withorpe/status/1148886063290540032 AI ethicists and experts raise data privacy issues Amazon has been involved in several controversies around privacy concerns regarding Alexa. Earlier this month, it admitted that a few voice recordings made by Alexa are never deleted from the company's server, even when the user manually deletes them. Another news in April this year revealed that when you speak to an Echo smart speaker, not only does Alexa but potentially Amazon employees also listen to your requests. Last month, two lawsuits were filed in Seattle stating that Amazon is recording voiceprints of children using its Alexa devices without their consent. Last year, an Amazon Echo user in Portland, Oregon was shocked when she learned that her Echo device recorded a conversation with her husband and sent the audio file to one of his employees in Seattle. Amazon confirmed that this was an error because of which the device’s microphone misheard a series of words. Another creepy, yet funny incident was when Alexa users started hearing an unprompted laugh from their smart speaker devices. Alexa laughed randomly when the device was not even being used. https://twitter.com/CaptHandlebar/status/966838302224666624 Big tech including Amazon, Google, and Facebook constantly try to reassure their users that their data is safe and they have appropriate privacy measures in place. But, these promises are hard to believe when there is so many news of data breaches involving these companies. Last year, a German computer magazine c’t reported that a user received 1,700 Alexa voice recordings from Amazon when he asked for copies of the personal data Amazon has about him. Many experts also raised their concerns about using Alexa for giving medical advice. A Berlin-based tech expert Manthana Stender calls this move a “corporate capture of public institutions”. https://twitter.com/StenderWorld/status/1148893625914404864 Dr. David Wrigley, a British medical doctor who works as a general practitioner also asked how the voice recordings of people asking for health advice will be handled. https://twitter.com/DavidGWrigley/status/1148884541144219648 Director of Big Brother Watch, Silkie Carlo told BBC,  "Any public money spent on this awful plan rather than frontline services would be a breathtaking waste. Healthcare is made inaccessible when trust and privacy is stripped away, and that's what this terrible plan would do. It's a data protection disaster waiting to happen." Prof Helen Stokes-Lampard, of the Royal College of GPs, believes that the move has "potential", especially for minor ailments. She added that it is important individuals do independent research to ensure the advice given is safe or it could "prevent people from seeking proper medical help and create even more pressure". She further said that not everyone is comfortable using such technology or could afford it. Amazon promises that the data will be kept confidential and will not be used to build a profile on customers. A spokesman shared with The Times, "All data was encrypted and kept confidential. Customers are in control of their voice history and can review or delete recordings." Amazon is being sued for recording children’s voices through Alexa without consent Amazon Alexa is HIPAA-compliant: bigger leap in the health care sector Amazon is supporting research into conversational AI with Alexa fellowships
Read more
  • 0
  • 0
  • 2698
article-image-british-airways-set-to-face-a-record-breaking-fine-of-183m-by-the-ico-over-customer-data-breach
Sugandha Lahoti
08 Jul 2019
6 min read
Save for later

British Airways set to face a record-breaking fine of £183m by the ICO over customer data breach

Sugandha Lahoti
08 Jul 2019
6 min read
UK’s watchdog ICO is all set to fine British Airways more than £183m over a customer data breach. In September last year, British Airways notified ICO about a data breach that compromised personal identification information of over 500,000 customers and is believed to have begun in June 2018. ICO said in a statement, “Following an extensive investigation, the ICO has issued a notice of its intention to fine British Airways £183.39M for infringements of the General Data Protection Regulation (GDPR).” Information Commissioner Elizabeth Denham said, "People's personal data is just that - personal. When an organisation fails to protect it from loss, damage or theft, it is more than an inconvenience. That's why the law is clear - when you are entrusted with personal data, you must look after it. Those that don't will face scrutiny from my office to check they have taken appropriate steps to protect fundamental privacy rights." How did the data breach occur? According to the details provided by the British Airways website, payments through its main website and mobile app were affected from 22:58 BST August 21, 2018, until 21:45 BST September 5, 2018. Per ICO’s investigation, user traffic from the British Airways site was being directed to a fraudulent site from where customer details were harvested by the attackers. Personal information compromised included log in, payment card, and travel booking details as well name and address information. The fraudulent site performed what is known as a supply chain attack embedding code from third-party suppliers to run payment authorisation, present ads or allow users to log into external services, etc. According to a cyber-security expert, Prof Alan Woodward at the University of Surrey, the British Airways hack may possibly have been a company insider who tampered with the website and app's code for malicious purposes. He also pointed out that live data was harvested on the site rather than stored data. https://twitter.com/EerkeBoiten/status/1148130739642413056 RiskIQ, a cyber security company based in San Francisco, linked the British Airways attack with the modus operandi of a threat group Magecart. Magecart injects scripts designed to steal sensitive data that consumers enter into online payment forms on e-commerce websites directly or through compromised third-party suppliers. Per RiskIQ, Magecart set up custom, targeted infrastructure to blend in with the British Airways website specifically and to avoid detection for as long as possible. What happens next for British Airways? The ICO noted that British Airways cooperated with its investigation, and has made security improvements since the breach was discovered. They now have 28 days to appeal. Responding to the news, British Airways’ chairman and chief executive Alex Cruz said that the company was “surprised and disappointed” by the ICO’s decision, and added that the company has found no evidence of fraudulent activity on accounts linked to the breach. He said, "British Airways responded quickly to a criminal act to steal customers' data. We have found no evidence of fraud/fraudulent activity on accounts linked to the theft. We apologise to our customers for any inconvenience this event caused." ICO was appointed as the lead supervisory authority to tackle this case on behalf of other EU Member State data protection authorities. Under the GDPR ‘one stop shop’ provisions the data protection authorities in the EU whose residents have been affected will also have the chance to comment on the ICO’s findings. The penalty is divided up between the other European data authorities, while the money that comes to the ICO goes directly to the Treasury. What is somewhat surprising is that ICO disclosed the fine publicly even before Supervisory Authorities commented on ICOs findings and a final decision has been taken based on their feedback, as pointed by Simon Hania. https://twitter.com/simonhania/status/1148145570961399808 Record breaking fine appreciated by experts The penalty imposed on British Airways is the first one to be made public since GDPR’s new policies about data privacy were introduced. GDPR makes it mandatory to report data security breaches to the information commissioner. They also increased the maximum penalty to 4% of turnover of the penalized company. The fine would be the largest the ICO has ever issued; last ICO fined Facebook £500,000 fine for the Cambridge Analytica scandal, which was the maximum under the 1998 Data Protection Act. The British Airways penalty amounts to 1.5% of its worldwide turnover in 2017, making it roughly 367 times than of Facebook’s. Infact, it could have been even worse if the maximum penalty was levied;  the full 4% of turnover would have meant a fine approaching £500m. Such a massive fine would clearly send a sudden shudder down the spine of any big corporation responsible for handling cybersecurity - if they compromise customers' data, a severe punishment is in order. https://twitter.com/j_opdenakker/status/1148145361799798785 Carl Gottlieb, Privacy Lead & Data Protection Officer at Duolingo has summarized the factoids of this attack in a twitter thread which were much appreciated. GDPR fines are for inappropriate security as opposed to getting breached. Breaches are a good pointer but are not themselves actionable. So organisations need to implement security that is appropriate for their size, means, risk and need. Security is an organisation's responsibility, whether you host IT yourself, outsource it or rely on someone else not getting hacked. The GDPR has teeth against anyone that messes up security, but clearly action will be greatest where the human impact is most significant. Threats of GDPR fines are what created change in privacy and security practices over the last 2 years (not orgs suddenly growing a conscience). And with very few fines so far, improvements have slowed, this will help. Monetary fines are a great example to change behaviour in others, but a TERRIBLE punishment to drive change in an affected organisation. Other enforcement measures, e.g. ceasing processing personal data (e.g. ban new signups) would be much more impactful. https://twitter.com/CarlGottlieb/status/1148119665257963521 Facebook fined $2.3 million by Germany for providing incomplete information about hate speech content European Union fined Google 1.49 billion euros for antitrust violations in online advertising French data regulator, CNIL imposes a fine of 50M euros against Google for failing to comply with GDPR.
Read more
  • 0
  • 0
  • 5658

article-image-the-road-to-cassandra-4-0-what-does-the-future-have-in-store
Guest Contributor
06 Jul 2019
5 min read
Save for later

The road to Cassandra 4.0 – What does the future have in store?

Guest Contributor
06 Jul 2019
5 min read
In May 2019, DataStax hosted the Accelerate conference for Apache Cassandra™ inviting community members, DataStax customers, and other users to come together, discuss the latest developments around Cassandra, and find out more about the development of Cassandra. Nate McCall, Apache Cassandra Project Chair, presented the road to version 4.0 and what the community is focusing on for the future. So, what does the future really hold for Cassandra? The project has been going for ten years already, so what has to be added?  First off, listening to Nate’s keynote, the approach to development has evolved. As part of the development approach around Cassandra, it’s important to understand who is committing updates to Cassandra. The number of organisations contributing to Cassandra has increased, while the companies involved in the Project Management Committee includes some of the biggest companies in the world.  The likes of Instagram, Facebook and Netflix have team members contributing and leading the development of Cassandra because it is essential to their businesses. For DataStax, we continue to support the growth and development of Cassandra as an open source project through our own code contributions, our development and training, and our drivers that are available for the community and for our customers alike.  Having said all this, there are still areas where Cassandra can improve as we get ready for 4.0. From a development standpoint, the big things to look forward to as mentioned in Nate’s keynote are:  An improved Repair model For a distributed database, being able to carry on through any failure event is critical. After a failure, those nodes will have to be brought back online, and then catch up with the transactions that they missed. Making nodes consistent is a big task, covered by the Repair function. In Cassandra 4.0, the aim is to make Repair smarter. For example, Cassandra can preview the impact of a repair on a host to check that the operation will go through successfully, and specific pull requests for data can also be supported. Alongside this, a new transient replication feature should reduce the cost and bandwidth overhead associated with repair. By replicating temporary copies of data to supplement full copies, the overall cluster should be able to achieve higher levels of availability but at the same time reduce the overall volume of storage required significantly. For companies running very large clusters, the cost savings achievable here could be massive. A Messaging rewrite Efficient messaging between nodes is essential when your database is distributed. Cassandra 4.0 will have a new messaging system in place based on Netty, an asynchronous event-driven network application framework. In practice, using Netty will improve performance of messaging between nodes within clusters and between clusters. On top of this change, zero copy support will provide the ability to improve how quickly data can be streamed between nodes. Zero copy support achieves this by modifying the streaming path to add additional information into the streaming header, and then using ZeroCopy APIs to transfer bytes to and from the network and disk. This allows nodes to transfer large files faster. Cassandra and Kubernetes support Adding new messaging support and being able to transfer SSTables means that Cassandra can add more support for Kubernetes, and for Kubernetes to do interesting things around Cassandra too. One area that has been discussed is around dynamic cluster management, where the number of nodes and the volume of storage can be increased or decreased on demand. Sidecars Sidecars are additional functional tools designed to work alongside a main process. These sidecars fill a gap that is not part of the main application or service, and that should remain separate but linked. For Cassandra, running sidecars allows developers to add more functionality to their operations, such as creating events on an application. Java 11 support Java 11 support has been added to the Cassandra trunk version and will be present in 4.0. This will allow Cassandra users to use Java 11, rather than version 8 which is no longer supported.  Diagnostic events and logging This will make it easier for teams to use events for a range of things, from security requirements through to logging activities and triggering tools.  As part of the conference, there were two big trends that I took from the event. The first is – as Nate commented in his keynote – that there was a definite need for more community events that can bring together people who care about Cassandra and get them working together.   The second is that Apache Cassandra is essential to many companies today. Some of the world’s largest internet companies and most valuable brands out there rely on Cassandra in order to achieve what they do. They are contributors and committers to Cassandra, and they have to be sure that Cassandra is ready to meet their requirements. For everyone using Cassandra, this means that versions have to be ready for use in production rather than having issues to be fixed. Things will get released when they are ready, rather than to meet a particular deadline. And the community will take the lead in ensuring that they are happy with any release.  Cassandra 4.0 is nearing release. It’ll be out when it is ready. Whether you are looking at getting involved with the project through contributions, developing drivers or through writing documentation, there is a warm welcome for everyone in the run up to what should be a great release.  I’m already looking forward to ApacheCon later this year! Author Bio Patrick McFadin is the vice president of developer relations at DataStax, where he leads a team devoted to making users of DataStax products successful. Previously, he was chief evangelist for Apache Cassandra and a consultant for DataStax, where he helped build some of the largest and most exciting deployments in production; a chief architect at Hobsons; and an Oracle DBA and developer for over 15 years.
Read more
  • 0
  • 0
  • 6378