Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon

Author Posts - Data

37 Articles
article-image-enhancing-data-quality-with-cleanlab
Prakhar Mishra
11 Dec 2024
10 min read
Save for later

Enhancing Data Quality with Cleanlab

Prakhar Mishra
11 Dec 2024
10 min read
IntroductionIt is a well-established fact that your machine-learning model is only as good as the data it is fed. ML model trained on bad-quality data usually has a number of issues. Here are a few ways that bad data might affect machine-learning models -1. Predictions that are wrong may be made as a result of errors, missing numbers, or other irregularities in low-quality data. The model's predictions are likely to be inaccurate if the data used to train is unreliable.2. Bad data can also bias the model. The ML model can learn and reinforce these biases if the data is not representative of the real-world situations, which can result in predictions that are discriminating.3. Poor data also disables the the ability of ML model to generalize on fresh data. Poor data may not effectively depict the underlying patterns and relationships in the data.4. Models trained on bad-quality data might need more retraining and maintenance. The overall cost and complexity of model deployment could rise as a result.As a result, it is critical to devote time and effort to data preprocessing and cleaning in order to decrease the impact of bad data on ML models. Furthermore, to ensure the model's dependability and performance, it is often necessary to use domain knowledge to recognize and address data quality issues.It might come as a surprise, but gold-standard datasets like ImageNet, CIFAR, MNIST, 20News, and more also contain labeling issues. I have put in some examples below for reference -The above snippet is from the Amazon sentiment review dataset , where the original label was Neutral in both cases, whereas Cleanlab and Mechanical turk said it to be positive (which is correct).The above snippet is from the MNIST dataset, where the original label was marked to be 8 and 0 respectively, which is incorrect. Instead, both Cleanlab and Mechanical Turk said it to be 9 and 6 (which is correct).Feel free to check out labelerrors to explore more such cases in similar datasets.Introducing CleanlabThis is where Cleanlab can come in handy as your best bet. It helps by automatically identifying problems in your ML dataset, it assists you in cleaning both data and labels. This data centric AI software uses your existing models to estimate dataset problems that can be fixed to train even better models. The graphic below depicts the typical data-centric AI model development cycle:Apart from the standard way of coding all the way through finding data issues, it also offers Cleanlab Studio - a no-code platform for fixing all your data errors. For the purpose of this blog, we will go the former way on our sample use case.Getting Hands-on with CleanlabInstallationInstalling cleanlab is as easy as doing a pip install. I recommend installing optional dependencies as well, you never know what you need and when. I also installed sentence transformers, as I would be using them for vectorizing the text. Sentence transformers come with a bag of many amazing models, we particularly use ‘all-mpnet-base-v2’ as our choice of sentence-transformers for vectorizing text sequences. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for tasks like clustering or semantic search. Feel free to check out this for the list of all models and their comparisons.pip install ‘cleanlab[all]’ pip install sentence-transformersDatasetWe picked the SMS Spam Detection dataset as our choice of dataset for doing the experimentation. It is a public set of labeled SMS messages that have been collected for mobile phone spam research with total instances of roughly ~5.5k. The below graphic gives a sneak peek of some of the samples from the dataset.Data PreviewCodeLet’s now delve into the code. For demonstration purposes, we inject a 5% noise in the dataset, and see if we are able to detect them and eventually train a better model.Note: I have also annotated every segment of the code wherever necessary for better understanding.import pandas as pd from sklearn.model_selection import train_test_split, cross_val_predict       from sklearn.preprocessing import LabelEncoder               from sklearn.linear_model import LogisticRegression       from sentence_transformers import SentenceTransformer       from cleanlab.classification import CleanLearning       from sklearn.metrics import f1_score # Reading and renaming data. Here we set sep=’\t’ because the data is tab       separated.       data = pd.read_csv('SMSSpamCollection', sep='\t')       data.rename({0:'label', 1:'text'}, inplace=True, axis=1)       # Dropping any instance of duplicates that could exist       data.drop_duplicates(subset=['text'], keep=False, inplace=True)       # Original data distribution for spam and not spam (ham) categories       print (data['label'].value_counts(normalize=True))       ham 0.865937       spam 0.134063       # Adding noise. Switching 5% of ham data to ‘spam’ label       tmp_df = data[data['label']=='ham']               examples_to_change = int(tmp_df.shape[0]*0.05)       print (f'Changing examples: {examples_to_change}')       examples_text_to_change = tmp_df.head(examples_to_change)['text'].tolist() changed_df = pd.DataFrame([[i, 'spam'] for i in examples_text_to_change])       changed_df.rename({0:'text', 1:'label'}, axis=1, inplace=True)       left_data = data[~data['text'].isin(examples_text_to_change)]       final_df = pd.concat([left_data, changed_df])               final_df.reset_index(drop=True, inplace=True)       Changing examples: 216       # Modified data distribution for spam and not spam (ham) categories       print (final_df['label'].value_counts(normalize=True))       ham 0.840016       spam 0.159984    raw_texts, raw_labels = final_df["text"].values, final_df["label"].values # Converting label into integers encoder = LabelEncoder() encoder.fit(raw_train_labels)       train_labels = encoder.transform(raw_train_labels)       test_labels = encoder.transform(raw_test_labels)       # Vectorizing text sequence using sentence-transformers transformer = SentenceTransformer('all-mpnet-base-v2') train_texts = transformer.encode(raw_train_texts)       test_texts = transformer.encode(raw_test_texts)       # Instatiating model instance model = LogisticRegression(max_iter=200) # Wrapping the sckit model around CL cl = CleanLearning(model) # Finding label issues in the train set label_issues = cl.find_label_issues(X=train_texts, labels=train_labels) # Picking top 50 samples based on confidence scores identified_issues = label_issues[label_issues["is_label_issue"] == True] lowest_quality_labels =       label_issues["label_quality"].argsort()[:50].to_numpy()       # Beauty print the label issue detected by CleanLab def print_as_df(index):    return pd.DataFrame(              {    "text": raw_train_texts,              "given_label": raw_train_labels,           "predicted_label": encoder.inverse_transform(label_issues["predicted_label"]),       },       ).iloc[index]       print_as_df(lowest_quality_labels[:5]) As we can see, Cleanlab assisted us in automatically removing the incorrect labels and training a better model with the same parameters and settings. In my experience, people frequently ignore data concerns in favor of building more sophisticated models to increase accuracy numbers. Improving data, on the other hand, is a pretty simple performance win. And, thanks to products like Cleanlab, it's become really simple and convenient.Feel free to access and play around with the above code in the Colab notebook hereConclusionIn conclusion, Cleanlab offers a straightforward solution to enhance data quality by addressing label inconsistencies, a crucial step in building more reliable and accurate machine learning models. By focusing on data integrity, Cleanlab simplifies the path to better performance and underscores the significance of clean data in the ever-evolving landscape of AI. Elevate your model's accuracy by investing in data quality, and explore the provided code to see the impact for yourself.Author BioPrakhar has a Master’s in Data Science with over 4 years of experience in industry across various sectors like Retail, Healthcare, Consumer Analytics, etc. His research interests include Natural Language Understanding and generation, and has published multiple research papers in reputed international publications in the relevant domain. Feel free to reach out to him on LinkedIn
Read more
  • 2
  • 0
  • 3382

article-image-unlocking-insights-how-power-bi-empowers-analytics-for-all-users
Gogula Aryalingam
29 Nov 2024
5 min read
Save for later

Unlocking Insights: How Power BI Empowers Analytics for All Users

Gogula Aryalingam
29 Nov 2024
5 min read
IntroductionIn today’s data-driven world, businesses rely heavily on robust tools to transform raw data into actionable insights. Among these tools, Microsoft Power BI stands out as a leader, renowned for its versatility and user-friendliness. From its humble beginnings as an Excel add-in, Power BI has evolved into a comprehensive enterprise business intelligence platform, competing with industry giants like Tableau and Qlik. This journey of transformation reflects not only Microsoft’s innovation but also the growing need for accessible, scalable analytics solutions.As a data specialist who has transitioned from traditional data warehousing to modern analytics platforms, I’ve witnessed firsthand how Power BI empowers both technical and non-technical users. It has become an indispensable tool, offering capabilities that bridge the gap between data modeling and visualization, catering to everyone from citizen developers to seasoned data analysts. This article explores the evolution of Power BI, its role in democratizing data analytics, and its integration into broader solutions like Microsoft Fabric, highlighting why mastering Power BI is critical for anyone pursuing a career in analytics.The Changing Tide for Data Analysts When you think of business intelligence in the modern era, Power BI is often the first tool that comes to mind. However, this wasn't always the case. Originally launched as an add-in for Microsoft Excel, Power BI quickly evolved into a comprehensive enterprise business intelligence platform in a few years competing with the likes of Qlik and Tableau—a true testament to its capabilities. As a data specialist, what really impresses me about Power BI's evolution is how Microsoft has continuously improved its user-friendliness, making both data modeling and visualizing more accessible, catering to both technical professionals and business users.  As a data specialist, initially working with traditional data warehousing, and now with modern data lakehouse-based analytics platforms, I’ve come to appreciate the capabilities that Power BI brings to the table. It empowers me to go beyond the basics, allowing me to develop detailed semantic layers and create impactful visualizations that turn raw data into actionable insights. This capability is crucial in delivering truly comprehensive, end-to-end analytics solutions. For technical folk like me, by building on our experiences working with these architectures and the deep understanding of the technologies and concepts that drive them, integrating Power BI into the workflow is a smooth and intuitive process. The transition to including Power BI in my solutions feels almost like a natural progression, as it seamlessly complements and enhances the existing frameworks I work with. It's become an indispensable tool in my data toolkit, helping me to push the boundaries of what's possible in analytics. In recent years, there has been a noticeable increase in the number of citizen developers and citizen data scientists. These are non-technical professionals who are well-versed in their business domains and dabble with technology to create their own solutions. This trend has driven the development of a range of low-code/no-code, visual tools such as Coda, Appian, OutSystems, Shopify, and Microsoft’s Power Platform. At the same time, the role of the data analyst has significantly expanded. More organizations are now entrusting data analysts with responsibilities that were traditionally handled by technology or IT departments. These include tasks like reporting, generating insights, data governance, and even managing the organization’s entire analytics function. This shift reflects the growing importance of data analytics in driving business decisions and operations. As a data specialist, I’ve been particularly impressed by how Power BI has evolved in terms of user-friendliness, catering not just to tech-savvy professionals but also to business users. Microsoft has continuously refined Power BI, simplifying complex tasks and making it easy for users of all skill levels to connect, model, and visualize data. This focus on usability is what makes Power BI such a powerful tool, accessible to a wide range of users. For non-technical users, Power BI offers a short learning curve, enabling them to connect to and model data for reporting without needing to rely on Excel, which they might be more familiar with. Once the data is modeled, they can explore a variety of visualization options to derive insights. Moreover, Power BI’s capabilities extend beyond simple reporting, allowing users to scale their work into a full-fledged enterprise business intelligence system. Many data analysts are now looking to deepen their understanding of the broader solutions and technologies that support their work. This is where Microsoft Fabric becomes essential. Fabric extends Power BI by transforming it into a comprehensive, end-to-end analytics platform, incorporating data lakes, data warehouses, data marts, data engineering, data science, and more. With these advanced capabilities, technical work becomes significantly easier, enabling data analysts to take their skills to the next level and realize their full potential in driving analytics solutions.  If you're considering a career in analytics and business intelligence, it's crucial to master the fundamentals and gain a comprehensive understanding of the necessary skills. With the field rapidly evolving, staying ahead means equipping yourself with the right knowledge to confidently join this dynamic industry. The Complete Power BI Interview Guide is designed to guide you through this process, providing the essential insights and tools you need to jump on board and thrive in your analytics journey. ConclusionConclusionMicrosoft Power BI has redefined the analytics landscape by making advanced business intelligence capabilities accessible to a wide audience, from technical professionals to business users. Its seamless integration into modern analytics workflows and its ability to support end-to-end solutions make it an invaluable tool in today’s data-centric environment. With the rise of citizen developers and expanded responsibilities for data analysts, tools like Power BI and platforms like Microsoft Fabric are paving the way for more innovative and comprehensive analytics solutions.For aspiring professionals, understanding the fundamentals of Power BI and its ecosystem is key to thriving in the analytics field. If you're looking to master Power BI and gain the confidence to excel in interviews and real-world scenarios, The Complete Power BI Interview Guide is an invaluable resource. From the core PowerBI concepts to interview preparation and onboarding tips and tricks, The Complete Power BI Interview Guide is the ultimate resource for beginners and aspiring Power BI job seekers who want to stand out from the competition.Author BioGogula is an analytics and BI architect born and raised in Sri Lanka. His childhood was spent dreaming, while most of his adulthood was and is spent working with technology. He currently works for a technology and services company based out of Colombo. He has accumulated close to 20 years of experience working with a diverse range of customers across various domains, including insurance, healthcare, logistics, manufacturing, fashion, F&B, K-12, and tertiary education. Throughout his career, he has undertaken multiple roles, including managing delivery, architecting, designing, and developing data & AI solutions. Gogula is a recipient of the Microsoft MVP award more than 15 times, has contributed to the development and standardization of Microsoft certifications, and holds over 15 data & AI certifications. In his leisure time, he enjoys experimenting with and writing about technology, as well as organizing and speaking at technology meetups. 
Read more
  • 0
  • 0
  • 2079

article-image-mastering-midjourney-ai-world-for-design-success
Margarida Barreto
21 Nov 2024
15 min read
Save for later

Mastering Midjourney AI World for Design Success

Margarida Barreto
21 Nov 2024
15 min read
IntroductionIn today’s rapidly shifting world of design and trends, artificial intelligence (AI) has become a reality! It’s now a creative partner that helps designers and creative minds go further and stand out from the competition. One of the leading AI tools revolutionizing the design process is Midjourney. Whether you’re an experienced professional or a curious beginner, mastering this tool can enhance your creative workflow and open up new possibilities for branding, advertising, and personal projects. In this article, we’ll explore how AI can act as a brainstorming partner, help overcome creative blocks, and provide insights into best practices for unlocking its full potential. Using AI as my creative colleague AI tools like Midjourney have the potential to become more than just assistants; they can function as creative collaborators. Often, as designers, we hit roadblocks—times when ideas run dry, or creative fatigue sets in. This is where Midjourney steps in, acting as a colleague who is always available for brainstorming. By generating multiple variations of an idea, it can inspire new directions or unlock solutions that may not have been immediately apparent. The beauty of AI lies in its ability to combine data insights with creative freedom. Midjourney, for instance, uses text prompts to generate visuals that help spark creativity. Whether you’re building moodboards, conceptualizing ad campaigns, or creating a specific portfolio of images, the tool’s vast generative capabilities enable you to break free from mental blocks and jumpstart new ideas. Best practices and trends in AI for creative workflows While AI offers incredible creative opportunities, mastering tools like Midjourney requires understanding its potential and limits. A key practice for success with AI is knowing how to use prompts effectively. Midjourney allows users to guide the AI with text descriptions or just image input, and the more you fine-tune those prompts, the closer the output aligns with your vision. Understanding the nuances of these prompts—from image weights to blending modes—enables you to achieve optimal results. A significant trend in AI design is the combination of multiple tools. MidJourney is powerful, but it’s not a one-stop solution. The best results often come from integrating other third-party tools like Kling.ai or Gen 3 Runway. These complementary tools help refine the output, bringing it to a professional level. For instance, Midjourney might generate the base image, but tools like Kling.ai could animate that image, creating dynamic visuals perfect for social media or advertising. Additionally, staying up to date with AI updates and model improvements is crucial. Midjourney regularly releases new versions that bring refined features and enhancements. Learning how these updates impact your workflow is a valuable skill, as mastering earlier versions helps build a deeper understanding of the tool’s evolution and future potential. The book, The Midjourney Expedition, dives into these aspects, offering both beginners and advanced users a guide to mastering each version of the tool. Overcoming creative blocks and boosting productivity One of the most exciting aspects of using AI in design is its ability to alleviate creative fatigue. When you’ve been working on a project for hours or days, it’s easy to feel stuck. Here’s an example of how AI helped me when I needed to create a mockup for a client’s campaign. I wasn’t finding suitable mockups on regular stock photo sites, so I decided to create my own.  I went to the MidJourney website: www.midjourney.com  Logged in using my Discord or Google account.  Go to Create (step 1 in the image below), enter the prompt (3D rendering of a blank vertical lightbox in front of a wall of a modern building. Outdoor advertising mockup template, front view) in the text box ( step 2), click on the icon on the right (step 3) to open the settings box (step 4) change any settings you want. In this case, lets keep it with the default settings, I just adjusted the settings to make the image landscape-oriented and pressed enter on my keyboard. 4 images will appear, choose the one you like the most or rerun the job, until you fell happy with the result.  I got my image, but now I need to add the advertisement I had previously generated on Midjourney, so I can present to my client some ideas for the final mockup. Lets click on the image to enlarge it and get more options. On the bottom of the page lets click on Editor In Editor mode and with the erase tool selected, erase the inside of the billboard frame, next copy the URL of the image you want to use as a reference to be inserted in the billboard, and edit your prompt to: https://cdn.midjourney.com/urloftheimage.png  3D rendering of a, Fashion cover of "VOGUE" magazine, a beautiful girl in a yellow coat and sunglasses against a blue background inside the frame, vertical digital billboard mockup in front of a modern building with a white wall at night. Glowing light inside the frame., in high resolution and high quality. And press Submit.  This is the final result.  In case you master any editing tool, you can skip this last step and personalize the mockup, for instance, in Photoshop. This is just one example of how AI saved me time and allowed me to create a custom mockup for my client. For many designers, MidJourney serves as another creative tool, always fresh with new perspectives, and helping unlock ideas we hadn’t considered. Moreover, AI can save hours of work. It allows designers to skip repetitive tasks, such as creating multiple iterations of mockups or ad layouts. By automating these processes, creatives can focus on refining their work and ensuring that the main visual content serves a purpose beyond aesthetics. The challenges of writing about a rapidly evolving tool Writing The Midjourney Expedition was a unique challenge because I was documenting a technology that evolves daily. AI design tools like Midjourney are constantly being updated, with new versions offering improved features and refined models. As I wrote the book, I found myself not only learning about the tool but also integrating the latest advancements as they occurred. One of the most interesting parts was revisiting the older versions of MidJourney. These models, once groundbreaking, now seem like relics, yet they offer valuable insights into how far the technology has come. Writing about these early versions gave me a sense of nostalgia, but it also highlighted the rapid progress in AI. The same principles that amazed us two years ago have been drastically improved, allowing us to create more accurate and visually stunning images. The book is not just about creating beautiful images, it’s about practical applications. As a communication designer, I’ve always focused on using AI to solve real-world problems, whether for branding, advertising, or storytelling. And I find Midjourney to be a powerful solution for any creative who need to go one step further in a effective way. Conclusion AI is not the future of design, it’s already here! While I don’t believe AI will replace creatives, any creator who masters these tools may replace those who don’t use them. Tools like Midjourney are transforming how we approach creative workflows and even final outcomes, enabling designers to collaborate with AI, overcome creative blocks, and produce better results faster. Whether you're new to AI or an experienced user, mastering these tools can unlock new opportunities for both personal and professional projects. By combining Midjourney with other creative tools, you can push your designs further, ensuring that AI serves as a valuable resource for your creative tasks. Unlock the full potential of AI in your creative workflows with "The Midjourney Expedition". This book is for creative professionals looking to leverage Midjourney. You’ll learn how to produce stunning AI art, streamline your creative process, and incorporate AI into your work, all while gaining a competitive edge in your industry.Author BioMargarida Barreto is a seasoned communication designer with over 20 years of experience in the industry. As the author of The Midjourney Expedition, she empowers creatives to explore the full potential of AI in their workflows. Margarida specializes in integrating AI tools like Midjourney into branding, advertising, and design, helping professionals overcome creative challenges and achieve outstanding results. 
Read more
  • 0
  • 0
  • 899
Banner background image

article-image-simplifying-ai-pipelines-using-the-fti-architecture
Paul Iusztin
08 Nov 2024
15 min read
Save for later

Simplifying AI pipelines using the FTI Architecture

Paul Iusztin
08 Nov 2024
15 min read
IntroductionNavigating the world of data and AI systems can be overwhelming.Their complexity often makes it difficult to visualize how data engineering, research (data science and machine learning), and production roles (AI engineering, ML engineering, MLOps) work together to form an end-to-end system.As a data engineer, your work finishes when standardized data is ingested into a data warehouse or lake.As a researcher, your work ends after training the optimal model on a static dataset and registering it.As an AI or ML engineer, deploying the model into production often signals the end of your responsibilities.As an MLOps engineer, your work finishes when operations are fully automated and adequately monitored for long-term stability.But is there a more intuitive and accessible way to comprehend the entire end-to-end data and AI ecosystem?Absolutely—through the FTI architecture.Let’s quickly dig into the FTI architecture and apply it to a production LLM & RAG use case. Figure 1: The mess of bringing structure between the common elements of an ML system.Introducing the FTI architectureThe FTI architecture proposes a clear and straightforward mind map that any team or person can follow to compute the features, train the model, and deploy an inference pipeline to make predictions.The pattern suggests that any ML system can be boiled down to these 3 pipelines: feature, training, and inference.This is powerful, as we can clearly define the scope and interface of each pipeline. Ultimately, we have just 3 instead of 20 moving pieces, as suggested in Figure 1, which is much easier to work with and define.Figure 2 shows the feature, training, and inference pipelines. We will zoom in on each one to understand its scope and interface.Figure 2: FTI architectureBefore going into the details, it is essential to understand that each pipeline is a separate component that can run on different processes or hardware. Thus, each pipeline can be written using a different technology, by a different team, or scaled differently.The feature pipelineThe feature pipeline takes raw data as input, processes it, and outputs the features and labels required by the model for training or inference.Instead of directly passing them to the model, the features and labels are stored inside a feature store. Its responsibility is to store, version, track, and share the features.By saving the features into a feature store, we always have a state of our features. Thus, we can easily send the features to the training and inference pipelines.The training pipelineThe training pipeline takes the features and labels from the features stored as input and outputs a trained model(s).The models are stored in a model registry. Its role is similar to that of feature stores, but the model is the first-class citizen this time. Thus, the model registry will store, version, track, and share the model with the inference pipeline.The inference pipelineThe inference pipeline takes as input the features and labels from the feature store and the trained model from the model registry. With these two, predictions can be easily made in either batch or real-time mode.As this is a versatile pattern, it is up to you to decide what you do with your predictions. If it’s a batch system, they will probably be stored in a DB. If it’s a real-time system, the predictions will be served to the client who requested them.The most important thing you must remember about the FTI pipelines is their interface. It doesn’t matter how complex your ML system gets — these interfaces will remain the same.The final thing you must understand about the FTI pattern is that the system doesn’t have to contain only 3 pipelines. In most cases, it will include more.For example, the feature pipeline can be composed of a service that computes the features and one that validates the data. Also, the training pipeline can comprise the training and evaluation components.Applying the FTI architecture to a use caseThe FTI architecture is tool-agnostic, but to better understand how it works, let’s present a concrete use case and tech stack.Use case: Fine-tune an LLM on your social media data (LinkedIn, Medium, GitHub) and expose it as a real-time RAG application. Let’s call it your LLM Twin.As we build an end-to-end system, we split it into 4 pipelines:The data collection pipeline (owned by the DE team)The FTI pipelines (owned by the AI teams)As the FTI architecture defines a straightforward interface, we can easily connect the data collection pipeline to the ML components through a data warehouse, which, in our case, is a MongoDB NoSQL DB.The feature pipeline (the second ML-oriented data pipeline) can easily extract standardized data from the data warehouse and preprocess it for fine-tuning and RAG.The communication between the two is done solely through the data warehouse. Thus, the feature pipeline isn’t aware of the data collection pipeline and how it collected the raw data. Figure 3: LLM Twin high-level architectureThe feature pipeline does two things:chunks, embeds and loads the data to a Qdrant vector DB for RAG;generates an instruct dataset and loads it into a versioned ZenML artifact.The training pipeline ingests a specific version of the instruct dataset, fine-tunes an open-source LLM from HuggingFace, such as Llama 3.1, and pushes it to a HuggingFace model registry.During the research phase, we use a Comet ML experiment tracker to compare multiple fine-tuning experiments and push only the best one to the model registry.During production, we can automate the training job and use our LLM evaluation strategy or canary tests to check if the new LLM is fit for production.As the input dataset and output model registry are decoupled, we can quickly launch our training jobs using ML platforms like AWS SageMaker.ZenML orchestrates the data collection, feature, and training pipelines. Thus, we can easily schedule them or run them on demand orThe end-to-end RAG application is implemented in the inference pipeline side, which accesses fresh documents from the Qdrant vector DB and the latest model from the HuggingFace model registry.Here, we can implement advanced RAG techniques such as query expansion, self-query and rerank to improve the accuracy of our retrieval step for better context during the generation step.The fine-tuned LLM will be deployed to AWS SageMaker as an inference endpoint. Meanwhile, the rest of the RAG application is hosted as a FastAPI server, exposing the end-to-end logic as REST API endpoints.The last step is to collect the input prompts and generated answers with a prompt monitoring tool such as Opik to evaluate the production LLM for things such as hallucinations, moderation or domain-specific problems such as writing tone and style.SummaryThe FTI architecture is a powerful mindmap that helps you connect the dots in the complex data and AI world, as illustrated in the LLM Twin use case.Unlock the full potential of Large Language Models with the "LLM Engineer's Handbook" by Paul Iusztin and Maxime Labonne. Dive deeper into real-world applications, like the FTI architecture, and learn how to seamlessly connect data engineering, ML pipelines, and AI production. With practical insights and step-by-step guidance, this handbook is an essential resource for anyone looking to master end-to-end AI systems. Don’t just read about AI—start building it. Get your copy today and transform how you approach LLM engineering!Author BioPaul Iusztin is a senior ML and MLOps engineer at Metaphysic, a leading GenAI platform, serving as one of their core engineers in taking their deep learning products to production. Along with Metaphysic, with over seven years of experience, he built GenAI, Computer Vision and MLOps solutions for CoreAI, Everseen, and Continental. Paul's determined passion and mission are to build data-intensive AI/ML products that serve the world and educate others about the process. As the Founder of Decoding ML, a channel for battle-tested content on learning how to design, code, and deploy production-grade ML, Paul has significantly enriched the engineering and MLOps community. His weekly content on ML engineering and his open-source courses focusing on end-to-end ML life cycles, such as Hands-on LLMs and LLM Twin, testify to his valuable contributions.
Read more
  • 0
  • 0
  • 2948

article-image-how-to-face-a-critical-rag-driven-generative-ai-challenge
Mr. Denis Rothman
06 Nov 2024
15 min read
Save for later

How to Face a Critical RAG-driven Generative AI Challenge

Mr. Denis Rothman
06 Nov 2024
15 min read
This article is an excerpt from the book, "RAG-Driven Generative AI", by Denis Rothman. Explore the transformative potential of RAG-driven LLMs, computer vision, and generative AI with this comprehensive guide, from basics to building a complex RAG pipeline.IntroductionOn a bright Monday morning, Dakota sits down to get to work and is called by the CEO of their software company, who looks quite worried. An important fire department needs a conversational AI agent to train hundreds of rookie firefighters nationwide on drone technology. The CEO looks dismayed because the data provided is spread over many websites around the country. Worse, the management of the fire department is coming over at 2 PM to see a demonstration to decide whether to work with Dakata’s company or not. Dakota is smiling. The CEO is puzzled.  Dakota explains that the AI team can put a prototype together in a few hours and be more than ready by 2 PM and get to work. The strategy is to divide the AI team into three sub-teams that will work in parallel on three pipelines based on the reference Deep Lake, LlamaIndex and OpenAI RAG program* they had tested and approved a few weeks back.  Pipeline 1: Collecting and preparing the documents provided by the fire department for this Proof of Concept(POC). Pipeline 2: Creating and populating a Deep Lake vector store with the first batch of documents while the Pipeline 1 team continues to retrieve and prepare the documents. Pipeline 3: Indexed-based RAG with LlamaIndex’s integrated OpenAI LLM performed on the first batch of vectorized documents. The team gets to work at around 9:30 AM after devising their strategy. The Pipeline 1 team begins by fetching and cleaning a batch of documents. They run Python functions to remove punctuation except for periods and noisy references within the content. Leveraging the automated functions they already have through the educational program, the result is satisfactory.  By 10 AM, the Pipeline 2 team sees the first batch of documents appear on their file server. They run the code they got from the RAG program* to create a Deep Lake vector store and seamlessly populate it with an OpenAI embedding model, as shown in the following excerpt: from llama_index.core import StorageContext vector_store_path = "hub://denis76/drone_v2" dataset_path = "hub://denis76/drone_v2" # overwrite=True will overwrite dataset, False will append it vector_store = DeepLakeVectorStore(dataset_path=dataset_path, overwrite=True)  Note that the path of the dataset points to the online Deep Lake vector store. The fact that the vector store is serverless is a huge advantage because there is no need to manage its size, storage process and just begin to populate it in a few seconds! Also, to process the first batch of documents, overwrite=True, will force the system to write the initial data. Then, starting the second batch,  the Pipeline 2 team can run overwrite=False, to append the following documents. Finally,  LlamaIndex automatically creates a vector store index: storage_context = StorageContext.from_defaults(vector_store=vector_store) # Create an index over the documents index = VectorStoreIndex.from_documents(documents, storage_context=storage_context) By 10:30 AM, the Pipeline 3 team can visualize the vectorized(embedded) dataset in their Deep Lake vector store. They create a LlamaIndex query engine on the dataset: from llama_index.core import VectorStoreIndex vector_store_index = VectorStoreIndex.from_documents(documents) … vector_query_engine = vector_store_index.as_query_engine(similarity_top_k=k, temperature=temp, num_output=mt) Note that the OpenAI Large Language Model is seamlessly integrated into LlamaIndex with the following parameters: k, in this case, k=3, specifies the number of documents to retrieve from the vector store. The retrieval is based on the similarity of embedded user inputs and embedded vectors within the dataset. temp, in this case temp=0.1, determines the randomness of the output. A low value such as 0.1 forces the similarity search to be precise. A higher value would allow for more diverse responses, which we do not want for this technological conversational agent. mt, in this case, mt=1024, determines the maximum number of tokens in the output. A cosine similarity function was added to make sure that the outputs matched the sample user inputs: from sentence_transformers import SentenceTransformer model = SentenceTransformer('all-MiniLM-L6-v2') def calculate_cosine_similarity_with_embeddings(text1, text2):     embeddings1 = model.encode(text1)     embeddings2 = model.encode(text2)     similarity = cosine_similarity([embeddings1], [embeddings2])     return similarity[0][0] By 11:00 AM, all three pipeline teams are warmed up and ready to go full throttle! While the Pipeline 2 team was creating the vector store and populating it with the first batch of documents, the Pipeline 1 team prepared the next several batches. At 11:00 AM, Dakota gave the green light to run all three pipelines simultaneously. Within a few minutes, the whole RAG-driven generative AI system was humming like a beehive! By 1:00 PM, Dakota and the three pipeline teams were working on a PowerPoint slideshow with a copilot. Within a few minutes, it was automatically generated based on their scenario. At 1:30 PM, they had time to grab a quick lunch. At 2:00 pm, the fire department management, Dakota’s team, and the CEO of their software company were in the meeting room.  Dakota’s team ran the PowerPoint slide show and began the demonstration with a simple input:  user_input="Explain how drones employ real-time image processing and machine learning algorithms to accurately detect events in various environmental conditions." The response displayed was satisfactory: Drones utilize real-time image processing and machine learning algorithms to accurately detect events in various environmental conditions by analyzing data captured by their sensors and cameras. This technology allows drones to process visual information quickly and efficiently, enabling them to identify specific objects, patterns, or changes in the environment in real-time. By employing these advanced algorithms, drones can effectively monitor and respond to different situations, such as wildfires, wildlife surveys, disaster relief efforts, and agricultural monitoring with precision and accuracy. Dakota’s team then showed that the program could track and display the original documents the response was based on. At one point, the fire department’s top manager, Taylor, exclaimed, “Wow, this is impressive! It’s exactly what we were looking for! " Of course, Dakato’s CEO began discussing the number of users, cost, and timelines with Taylor. In the meantime, Dakota and the rest of the fire department’s team went out to drink some coffee and get to know each other. Fire departments intervene at short notice efficiently for emergencies. So can expert-level AI teams! https://github.com/Denis2054/RAG-Driven-Generative-AI/blob/main/Chapter03/Deep_Lake_LlamaIndex_OpenAI_RAG.ipynb ConclusionIn facing a high-stakes, time-sensitive challenge, Dakota and their AI team demonstrated the power and efficiency of RAG-driven generative AI. By leveraging a structured, multi-pipeline approach with tools like Deep Lake, LlamaIndex, and OpenAI’s advanced models, the team was able to integrate scattered data sources quickly and effectively, delivering a sophisticated, real-time conversational AI prototype tailored for firefighter training on drone technology. Their success showcases how expert planning, resourceful use of AI tools, and teamwork can transform a complex project into a streamlined solution that meets client needs. This case underscores the potential of generative AI to create responsive, practical solutions for critical industries, setting a new standard for rapid, high-quality AI deployment in real-world applications.Author Bio Denis Rothman graduated from Sorbonne University and Paris-Diderot University, and as a student, he wrote and registered a patent for one of the earliest word2vector embeddings and word piece tokenization solutions. He started a company focused on deploying AI and went on to author one of the first AI cognitive NLP chatbots, applied as a language teaching tool for Mo�t et Chandon (part of LVMH) and more. Denis rapidly became an expert in explainable AI, incorporating interpretable, acceptance-based explanation data and interfaces into solutions implemented for major corporate projects in the aerospace, apparel, and supply chain sectors. His core belief is that you only really know something once you have taught somebody how to do it.
Read more
  • 0
  • 0
  • 2098

article-image-unlocking-excels-potential-extend-your-spreadsheets-with-r-and-python
Steven Sanderson, David Kun
17 Oct 2024
5 min read
Save for later

Unlocking Excel's Potential: Extend Your Spreadsheets with R and Python

Steven Sanderson, David Kun
17 Oct 2024
5 min read
Introduction Are you an Excel user looking to push your data analysis capabilities beyond the familiar cells and formulas? If so, you're about to embark on a transformative journey. With the integration of R and Python, you can elevate Excel into a powerhouse of advanced data analysis and visualization. In this blog post, inspired by the book "Extending Excel with Python and R," co-authored by myself and David Kun, we will dive deep into practical implementation, focusing on how to automate data visualization in Excel using these powerful programming languages. Practical Implementation: Creating Advanced Data Visualizations In the world of data analysis, visual representation is key to understanding complex datasets. Excel, while equipped with basic charting tools, often requires enhancement for more sophisticated visuals. By integrating R and Python, you can create dynamic and detailed graphs that bring your data to life. Task: Automating Data Visualization with Python and R Step-by-Step Guide Step 1: Set Up Your Environment Before jumping into visualization, ensure you have the necessary tools installed. You will need: Excel: Ensure you have a version that supports VBA (Visual Basic for Applications). Python: Install Python on your computer. You can download it from the official Python website. R: Similarly, install R from the Comprehensive R Archive Network (CRAN). Libraries: For Python, install `pandas`, `matplotlib`, and `openpyxl` using pip. For R, install `ggplot2` and `readxl`.  Step 2: Importing Data Begin by importing your Excel data into Python or R. Here’s a Python snippet using pandas:  In R, use readxl:  Step 3: Creating Visualizations Python Example Using Matplotlib, you can create a simple line plot: Python Example   R Example With ggplot2, the process is equally straightforward where df is some data frame loaded in:  Step 4: Integrating Visualizations into Excel Once your visualization is created, the next step is to integrate it back into Excel. This can be done manually, or you can automate it using VBA or an API endpoint. Python Integration Using openpyxl, you can embed images:   R Integration For R, you might automate this process using R scripts that interact with Excel via VBA or other packages like `officer`.  Step 5: Automating the Entire Workflow To automate, consider using Python scripts executed from Excel VBA or R scripts called through Excel's RExcel plugin. This way, you can refresh data and update visualizations with minimal effort. Conclusion By integrating R and Python with Excel, you unlock a realm of possibilities for data visualization and analysis, turning Excel from a simple spreadsheet tool into a comprehensive data analytics suite. This guide provides a snapshot of what you can achieve, and with further exploration, the potential is limitless. Author Bio Steven Sanderson is a Manager of Applications with a deep passion for data and its compliments: cleaning, analysis, visualization and communication. He is known primarily for his work in R. After his MPH, Steven continued his work in the healthcare industry as a clinical decision support analyst working his way up to Manager of Applications at Stony Brook Medicine for Patient Financial Services. He currently is focused on expanding functions in his healthyverse suite of packages while also slimming them down and expanding their robustness. He also now enjoys helping mentor junior employees to set them up for success. David Kun is a mathematician and actuary who has always worked in the gray zone between quantitative teams and ICT, aiming to build a bridge. He is a co-founder and director of Functional Analytics, the creator of the ownR infinity platform. As a data scientist, he also uses ownR for his daily work. His projects include time series analysis for demand forecasting, computer vision for design automation, and visualization. Looking to Master Excel with Python and R?If you're excited about extending Excel’s capabilities with powerful tools like Python and R, Extending Excel with Python and R, authored by Steven Sanderson, David Kun, offers an in-depth guide to seamlessly integrating these languages into your Excel workflow. It covers everything from automating data tasks to advanced visualizations, all tailored for Excel enthusiasts.
Read more
  • 0
  • 0
  • 1583
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime
article-image-understanding-the-fundamentals-of-analytics-teams-with-john-k-thompson
Expert Network
06 Apr 2021
6 min read
Save for later

Understanding the Fundamentals of Analytics Teams with John K. Thompson

Expert Network
06 Apr 2021
6 min read
Key-takeaways:   Data scientists need a tailored portfolio of projects that they own and manage to have a sense of autonomy.  The top skill or personality trait a successful data scientist can possess (and should possess) is curiosity.  Managing a successful analytics team and individual analytics professionals is different than managing any other type of team.  Data and analytics will be ubiquitous in the very near future. Analytics teams are different than any other team in the organization and analytics professionals are unique variant of creative professionals. Providing challenging, interesting and valuable work in the form of a personal project portfolio of work for a data scientist can be done and needs to be done to ensure productivity, job satisfaction, value delivery, and retention.  We interviewed Analytics Leader, and bestselling author, John K Thompson on data analytics, the future of analytics and his recent book, Building Analytics Teams. The interview in detail:  1. What are the fundamental concepts of building and managing a high-performing analytics team?  It is critically important to remember that data scientists are creative and intelligent people. They cannot be managed well in a command-and-control environment.  Data scientists need a tailored portfolio of projects that they own and manage to have a sense of autonomy. If they have a portfolio of projects and can manage their time and effort, the productivity of the team will be much higher than what is typically seen in teams managed in a traditional manner.  The relationship of the analytics leader with their peers and executives of the company is critically important to the success of the analytics team.  It is very important to realize that most analytics project fail at the point of where analytical models are to be implemented in production systems. 2. Tell us about your book, Building Analytics Teams. How is your book new and/or different from other books on Data Analytics?   Building Analytics Teams is focused on the practical challenges faced by people who are building and managing high performance analytics teams and the staff members who make up those analytics teams.    The book is different from other books in that it examines the process of building and managing a team from a holistic view.  The book considers the organization framework, the required processes, the people, the projects, the problems, and pitfalls.    The content of the book guides the reader through how to navigate these challenges and provides illustrations and examples of how to be successful.  The book is a “how to” guide on how to successfully manage the analytics process in a large corporate environment.  3. What was the motivation behind writing this book?   I have not seen a book like this, and I wish I had a book like this earlier in my career.  I have built a number of analytics teams. While building and growing those teams, I noticed certain recurring patterns. I wanted to address the misconceptions and the misperceptions people hold about analytics teams.    Analytics teams are unique. The team members who are successful have a different mindset and attitude toward project work and team work. I wanted to communicate the differences inherent in a high-performance analytics team when compared to other teams.  Also, I wanted to communicate that managing a successful analytics team and individual analytics professionals is different than managing any other type of team.    I wanted to write a guide for managers and analytics professionals to help them understand how the broader organization views them and how they can interface and interact with their peers in related organizational functions to increase the probability of joint success.  4. What should be the starting point for data analytics enthusiasts aiming to begin their journey in Data Analytics? How do you think your book will help them in their journey?  It depends on where they are starting their journey.    If they are in the process of completing their undergraduate or graduate studies, I would suggest that they take classes in programming, data science or analytics.    If they are professionals, I would suggest that they take classes on Coursera, Udemy or any other on-line educational platform to see if they have a real interest in, and affinity for, analytics.  If they do have an interest, then they should start working on analytics for themselves to test out analytical techniques, apply critical thinking and try to understand what they can see or cannot see in the data.  If that works out and their interest remains, they should volunteer for projects at work that will enable them to work with data and analytics in a work setting.  If they have the education, the affinity and the skill, then apply for a data science position.  Grab some data and make a difference!  5. What are the key skills required for someone to be successful working in Data Analytics? What are the pain points/challenges one should know?  The top skill or personality trait a successful data scientist can possess (and should possess) is curiosity. Without curiosity, you will find it difficult to be successful as a data scientist.  It helps to be talented and well educated, but I have met many stellar data scientists that are neither.  Beyond those traits, it is more important to be diligent and persistent.  The most successful business analysts and data scientists I have ever worked with were all naturally and perpetually curious and had a level of diligence and persistence that was impressive.  As for pain points and challenges; data scientists need to work on improving their listening skills, their written & verbal communication and presentation skills.  All data scientists need improvement in these areas.  6. What is the future of analytics? What will we see next?  I do believe that we are entering an era where data and analytics will be increasing in importance in all human endeavors. Certainly, corporate use of data and analytics will increase in importance, hence the focus of the book.    But beyond corporations, the active and engaged use of data and analytics will increase in importance and daily use in managing multiple aspects of - people’s personal lives, academic pursuits, governmental policy, military operations, humanitarian aid, tailoring of products and services; building of roads, towns and cities, planning of traffic patterns, provisioning of local federal and state services, intergovernmental relationships and more.    There will not be an element of human endeavor that will not be touched and changed by data and analytics. Data is ubiquitous today and data and analytics will be ubiquitous in the very near future.  We will see more discussions on who owns data and who should be able to monetize data.  We will experience increasing levels of AI and analytics across all systems that we interact with, and most of it will be unnoticed and operate in the background for our benefit.  About:  John K. Thompson is an international technology executive with over 30 years of experience in the business intelligence and advanced analytics fields. Currently, John is responsible for the global Advanced Analytics and Artificial Intelligence team and efforts at CSL Behring. 
Read more
  • 0
  • 0
  • 6237

article-image-imran-bashir-on-the-fundamentals-of-blockchain-its-myths-and-an-ideal-path-for-beginners
Expert Network
15 Feb 2021
5 min read
Save for later

Imran Bashir on the Fundamentals of Blockchain, its Myths, and an Ideal Path for Beginners

Expert Network
15 Feb 2021
5 min read
With the invention of Bitcoin in 2008, the world was introduced to a new concept, Blockchain, which revolutionized the whole of society. It was something that promised to have an impact upon every industry. This new concept is the underlying technology that underpins Bitcoin.  Blockchain technology is the backbone of cryptocurrencies, and it has applications in finance, government, media, and many other industries.   Some describe blockchain as a revolution, whereas another school of thought believes that it is going to be more evolutionary, and it will take many years before any practical benefits of blockchain reach fruition. This thinking is correct to some extent, but, in Imran Bashir’s opinion, the revolution has already begun. It is a technology that has an impact on current technologies too and possesses the ability to change them at a fundamental level.  Let’s hear from Imran on fundamentals of blockchain technology, its myths and his recent book, Mastering Blockchain, Third Edition. What is blockchain technology? How would you describe it to a beginner in the field? Blockchain is a distributed ledger which runs on a decentralized peer to peer network. First introduced with Bitcoin as a mechanism that ensures security of the electronic cash system, blockchain has now become a prime area of research with many applications in a variety of industries and sectors.   What should be the starting point for someone aiming to begin their journey in Blockchain? Focus on the underlying principles and core concepts such as distributed systems, consensus, cryptography, and development using no helper tools in the start. Once you understand the basics and the underlying mechanics, then you can use tools such as truffle or some other framework to make your developer life easier, however it is extremely important to learn the underlying concepts first.   What is the biggest myth about blockchain? Sometimes people believe that blockchain IS cryptocurrency, however that is not the case. Blockchain is the underlying technology behind cryptocurrencies that ensures the security, and integrity of the system and prevents double spends. However, cryptocurrency can be considered one application of blockchain technology out of many.      “Blockchain is one of the most disruptive emerging technologies today.” How much do you agree with this? Indeed, it is true.  Blockchain is changing the way we do business. In the next 5 years or so, financial systems, government systems and other major sectors will all have blockchain integrated in one way or another.   What are the factors driving development of the mainstream adoption of Blockchain? The development of standards, interoperability efforts, and consortium blockchain are all contributing towards mainstream adoption of blockchain. Also demand for more security, transparency, and decentralization in some sectors are also key drivers behind more adoption, e.g., a prime solution for decentralized sovereign identity is blockchain.   How do you explain the term bitcoin mining? Mining is a colloquial term used to describe the process of creating new bitcoins where a miner repeatedly tries to find a solution to a math puzzle and whoever finds it first wins the right to create new block and earn bitcoins as a reward.    How can Blockchain protect the Global economy? I think with the trust, transparency and security guarantees provided by blockchain we can perceive a future where financial crime can be limited to a great degree. That can have a good impact on the global economy. Furthermore, the development of CDBCs (central bank digital currencies) are expected to have a major impact on the economy and help to stabilize it. From an inclusion point of view, blockchain can allow unbanked populations to play a role in the global financial system. If cryptocurrencies replace the current monetary system, then because of the decentralized nature of blockchain, major cost savings can be achieved as no intermediaries or banks will be required, and a peer to peer, extremely low cost, global financial system can emerge which can transform the world economy. The entire remittance ecosystem can evolve into an extremely low cost, secure, real-time system which can include people who were porously unbanked. The possibilities are endless.   Tell us a bit about your book, Mastering Blockchain, Third Edition? Mastering Blockchain, Third Edition is a unique combination of theory and practice. Not only does it provides a holistic view of most areas of blockchain technology, it also covers hands on exercises using Ethereum, Bitcoin, Quroum and Hyperledger to equip readers with both theory and practical knowledge of blockchain technology. The third edition includes four new chapters on hot topics such as blockchain consensus, tokenization, Ethereum 2 and Enterprise blockchains.  About the author  Imran Bashir has an M.Sc. in Information Security from Royal Holloway, University of London, and has a background in software development, solution architecture, infrastructure management, and IT service management. He is also a member of the Institute of Electrical and Electronics Engineers (IEEE) and the British Computer Society (BCS). Imran has extensive experience in both the public and financial sectors, having worked on large-scale IT projects in the public sector before moving to the financial services industry. Since then, he has worked in various technical roles for different financial companies in Europe's financial capital, London. 
Read more
  • 0
  • 0
  • 6675

article-image-understand-quickbooks-online-desktop-online-security-use-cases-and-more-with-crystalynn-shelton-a-certified-quickbooks-proadvisor
Vincy Davis
27 Dec 2019
8 min read
Save for later

Understand Quickbooks online/desktop, online security, use cases, and more with Crystalynn Shelton, a certified QuickBooks ProAdvisor

Vincy Davis
27 Dec 2019
8 min read
Quickbooks, the accounting software package developed and marketed by Intuit is targeted towards small and medium-sized businesses. It offers on-premises accounting applications and cloud-based versions that can undertake remote access capabilities like remote payroll assistance and outsourcing, electronic payment functions, online banking, and reconciliation, mapping features, and more. To know more about Quickbooks’ latest features and its learning curve for beginners, we did a quick interview with Crystalynn Shelton, a certified QuickBooks ProAdvisor and author of the book ‘Mastering QuickBooks 2020’. With more than 10 years of experience in Quickbooks, Shelton says Quickbooks is not only user-friendly but also cost-effective. Further, when asked about her views on QuickBooks online, Shelton points out that its live unlimited technical support is one of its main features  On Quickbooks, its benefits and use cases What are some of the advantages of Quickbooks that sets it apart from its competitors?  QuickBooks has a number of advantages that set them apart from its competitors. First, it is affordable for most small businesses. Whether you purchase an Online subscription (starting at $20/month) or a desktop product (starting at a one-time fee of $199), there is something for every budget. Another benefit of using QuickBooks is the program is very user-friendly. Most small business owners purchase the software and are able to set it up without having an IT person on staff.  In addition, there are a number of training videos, an extensive help menu within the program not to mention live tech support if you need it. Because QuickBooks is the most widely used accounting software program used by small businesses, most accountants and CPAs are familiar with the program. Some of these folks are certified ProAdvisors (like myself). They can offer consulting, training, and even bookkeeping services to small business owners who use QuickBooks. Can you elaborate on how small businesses can take benefit from Quickbooks? Also, how does Quickbooks simplifies tasks for them?  While there are numerous reasons why small businesses decide to use QuickBooks, there are five that tend to be the most common reasons: small businesses who can’t afford to hire a bookkeeper, small businesses who have outgrown the use of Excel spreadsheets and need a more sophisticated way to track income and expenses, small businesses who need financial statements in order to apply for a line of credit or business loan, small businesses whose tax professional will no longer accept a shoebox of receipts to file taxes. QuickBooks simplifies bookkeeping by allowing you to track all aspects of the business in one place: accounts payable, accounts receivable, income, and expenses. It uses simple language such as “people who owe you” (aka accounts receivable) or “what you owe to others” (aka accounts payable) to help business owners without prior bookkeeping knowledge comprehend the program. QuickBooks allows you to accept credit card payments from customers so you can get paid faster and easily reconcile payments to open invoices. Not to mention you can reduce (if not eliminate) manual data entry by connecting all of your business bank and credit card accounts to QBO.  Can you elaborate on how your book ‘Mastering QuickBooks 2020’ will prepare bookkeepers and accounting students in learning the ropes of QuickBooks? Also, how does the learning curve look like for users who have no bookkeeping knowledge and no experience with QuickBooks? This book was written with the assumption that the reader has no experience or knowledge of bookkeeping. We use simple language to explain how QuickBooks works and we have also provided screenshots to support the concepts being taught.  Chapter 1 includes a section that covers bookkeeping basics which will help non-accountants gain a better understanding of the terminology used in the field of accounting as well as QuickBooks. This information will help aspiring accountants build on their existing bookkeeping knowledge.  In addition, we have included the behind the scenes debits and credits for certain transactions to help accounting students prepare for the CPA exam or other academic tests. Shelton’s views on QuickBooks Online and Desktop What are your thoughts on QuickBooks Online and Quickbooks Desktop? What are the benefits of cloud accounting over Desktop? Do factors such as the size of an organization, or its maturity matter in choosing between the online and the desktop version? There are several benefits of using cloud accounting software over the desktop. Cloud accounting software allows you to manage your business from any device with an internet connection; whereas desktop limits you to a desktop computer. With cloud accounting software like QuickBooks Online, you can give anyone access to your QuickBooks data without them having to travel to your office. Cloud accounting software includes automatic real-time updates of your data. Unlike desktop software, you don’t have to worry about backing up your data with Online; its automatically done for you. Finally, QuickBooks Online includes unlimited live technical support. This is an invaluable feature for small business owners who are managing their own books and need the ability to get help when they need it. The size of an organization, structure, and length of time in business can definitely impact whether a business should choose QuickBooks Online or desktop. As a QuickBooks ProAdvisor, one of the first things I do is conduct an assessment to determine what the needs of my clients are. This involves documenting the details of their current processes (i.e. invoicing customers, paying bills, managing inventory, etc.)  Once I have this information, I am able to determine whether QuickBooks desktop is right or if QuickBooks Online is the best fit. If both products are ideal, I provide my clients with the downsides (if any) of going with one product over the other. This gives my clients all of the information they need to make an informed decision. On how Quickbook secures online data How does Quickbooks help in securing payments? How does QuickBooks keep online data safe? To secure payments, intuit transmits, support, protect, and access all cardholder information in compliance with the Payment Card Industry’s (PCI) data security standards. Additional security precautions Intuit has implemented are as follows: All data between Intuit servers and their customers is encrypted with at least 128-bit TLS, and all copies of daily backup data are encrypted with 256-bit AES encryption. Data is kept secure with multiple servers housed in Tier-3 data centers that have strict access controls and real-time video monitoring of the data center. All servers are hardened Linux installations, which are monitored in real-time and kept up-to-date with security patches. Can you suggest some best practices (at least five) that will help Quickbook aspirants in saving time and becoming a Quickbook pro? There are several ways you can save time and become proficient in QuickBooks Online. First, I recommend that you use QuickBooks on a daily basis. The more hands-on experience with QuickBooks, the more proficient you will become. Second, take the time to properly set up your QuickBooks account before you start entering transactions.  In Chapter 2, we provide you with a detailed checklist which includes what information you need to setup QuickBooks. By taking the time to set up customers, vendors, the chart of accounts, and your products and services upfront, the less time you will spend having to do it later on when you are trying to enter data. Third, all aspiring bookkeepers and accountants should get certified in QuickBooks Online. Certification is offered through Intuit and it is free.  As a Certified QuickBooks ProAdvisor, you get access to product discounts, marketing materials to promote bookkeeping services to prospective clients, a certification badge and designation you can put on business cards, websites and email signature lines.  Fourth, utilize keyboard shortcuts. They will save you time as you navigate the program. We have included a list of QBO keyboard shortcuts in the appendix of this book. Finally, connect as many bank and credit card accounts as you can to QBO. By doing so, you will reduce the amount of manual data entry required which will help you to keep your books up-to-date. If you want to learn how to build the perfect budget, simplify tax return preparation, manage inventory, track job costs, generate income statements and financial reports, check out Crystalynn’s book ‘Mastering QuickBooks 2020’. This book will work for a small business owner, bookkeeper, or accounting student who wants to learn how to make the most of QuickBooks Online. About the author Crystalynn Shelton is a licensed Certified Public Accountant, a certified QuickBooks ProAdvisor and has been certified in QuickBooks for more than 10 years. Crystalynn is currently a staff writer for Fit Small Business and an Adjunct Instructor at UCLA Extension where she teaches accounting, bookkeeping and QuickBooks to hundreds of small business owners and accounting students each year. Her previous experience includes working at Intuit (QuickBooks) as a Sr. Learning Specialist.  MongoDB’s CTO Eliot Horowitz on what’s new in MongoDB 4.2, Ops Manager, Atlas, and more New QGIS 3D capabilities and future plans presented by Martin Dobias, a core QGIS developer Greg Walters on PyTorch and real-world implementations and future potential of GANs Elastic marks its entry in security analytics market with Elastic SIEM and Endgame acquisition “The challenge in Deep Learning is to sustain the current pace of innovation”, explains Ivan Vasilev, machine learning engineer
Read more
  • 0
  • 0
  • 5581

article-image-greg-walters-on-pytorch-and-real-world-implementations-and-future-potential-of-gans
Vincy Davis
13 Dec 2019
10 min read
Save for later

Greg Walters on PyTorch and real-world implementations and future potential of GANs

Vincy Davis
13 Dec 2019
10 min read
Introduced in 2014, GANs (Generative Adversarial Networks) was first presented by Ian Goodfellow and other researchers at the University of Montreal. It comprises of two deep networks, the generator which generates data instances, and the discriminator which evaluates the data for authenticity. GANs works not only as a form of generative model for unsupervised learning, but also has proved useful for semi-supervised learning, fully supervised learning, and reinforcement learning. In this article, we are in conversation with Greg Walters, one of the authors of the book 'Hands-On Generative Adversarial Networks with PyTorch 1.x', where we discuss some of the real-world applications of GANs. According to Greg, facial recognition and age progression will one of the areas where GANs will shine in the future. He believes that with time GANs will soon be visible in more real-world applications, as with GANs the possibilities are unlimited. On why PyTorch for building GANs Why choose PyTorch for GANs? Is PyTorch better than other popular frameworks like Tensorflow? Both PyTorch and Tensorflow are good products. Tensorflow is based on code from Google and PyTorch is based on code from Facebook. I think that PyTorch is more pythonic and (in my opinion) is easier to learn. Tensorflow is two years older than PyTorch, which gives it a bit of an edge, and does have a few advantages over PyTorch like visualization and deploying trained models to the web. However, one of the biggest advantages that PyTorch has is the ability to handle distributed training. It’s much easier when using PyTorch. I’m sure that both groups are looking at trying to lessen the gaps that exist and that we will see big changes in both. Refer to Chapter 4 of my book to learn how to use PyTorch to train a GAN model. Have you had a chance to explore the recently released PyTorch 1.3 version? What are your thoughts on the experimental feature - named tensors? How do you think it will help developers in getting a more readable and maintainable code? What are your thoughts on other features like PyTorch Mobile and 8-bit model quantization for mobile-optimized AI? The book was originally written to introduce PyTorch 1.0 but quickly evolved to work with PyTorch 1.3.x. Things are moving very quickly for PyTorch, so it presents an evermoving target.  Named tensors are very exciting to me. I haven’t had a chance to spend a tremendous amount of time on them yet, but I plan to continue working with them and explore them deeply. I believe that they will help make some of the concepts of manipulating tensors much easier for beginners to understand and read and understand the code created by others. This will help create more novel and useful GANs for the future. The same can be said for PyTorch Mobile. Expanding capabilities to more (and less expensive) processor types like ARM creates more opportunities for programmers and companies that don’t have the high-end capabilities. Consider the possibilities of running a heavy-duty AI on a $35 Raspberry Pi. The possibilities are endless. With PyTorch Mobile, both Android and iOS devices can benefit from the new advances in image recognition and other AI programs. The 8-bit model quantization allows tensor operations to be done using integers rather than floating-point values, allowing models to be more compact. I can’t begin to speculate on what this will bring us in the way of applications in the future. You can read Chapter 2 of my book to know more about the new features in PyTorch 1.3. On challenges and real-world applications of GANs GANs have found some very interesting implementations in the past year like a deepfake that can animate your face with just your voice, a neural GAN to fight fake news, a CycleGAN to visualize the effects of climate change, and more. Most of the GAN implementations are built for experimentation or research purposes. Do you think GANs can soon translate to solve real-world problems? What do you think are the current challenge that restrict GANs from being implemented in real-world scenarios? Yes. I do believe that we will see GANs starting to move to more real-world applications. Remember that in the grand scheme of things, GANs are still fairly new. 2014 wasn’t that long ago. We will see things start to pop in 2020 and move forward from there. As to the current challenges, I think that it’s simply a matter of getting the word out. Many people who are conversant with Machine Learning still haven’t heard of GANs, mainly due to the fact that they are so busy with what they know and are comfortable with, so they haven’t had the time and/or energy to explore GANs yet. That will change. Of course, things change on almost a daily basis, so who can guess where we will be in another two years? Some of the existing and future applications that GANs can help implement include new photo-realistic scenes for video games, movies, and television, taking sketches from designers and making realistic photographs in both the fashion industry and architecture, taking a partial facial image and making a rotated view for better facial recognition, age progression and regression and so much more. Pretty much anything with a pattern, be it image or text can be manipulated using GANs. There are a variety of GANs available out there. How should one approach them in terms of problem solving? What are the other possible ways to group GANs? That’s a very hard question to answer. You are correct, there are a large number of GANs in “the wild” and some work better for some things than others. That was one of the big challenges of writing the book.  Add to that, new GANs are coming out all the time that continue to get better and better and extend the possibility matrix. The best suggestion that I could make here is to use the resources of the Internet and read, read and read. Try one or two to see what works best for your application. Also, create your own category list that you create based on your research. Continue to refine the categories as you go. Then share your findings so others can benefit from what you’ve learned. New GANs implementations and future potential In your book, 'Hands-On Generative Adversarial Networks with PyTorch 1.x', you have demonstrated how GANs can be used in image restoration problems, such as super-resolution image reconstruction and image inpainting. How do SRGAN help in improving the resolution of images and performing image inpainting? What other deep learning models can be used to address image restoration problems? What are other keep image related problems where GANs are useful and relevant? Well, that is sort of like asking “how long is a piece of string”. Picture a painting in a museum that has been damaged from fire or over time. Right now, we have to rely on very highly trained experts who spend hundreds of hours to bring the painting back to its original glory. However, it’s still an approximation of what the expert THINKS the original was to be. With things like SRGAN, we can see old photos “restored” to what they were originally. We already can see colorized versions of some black and white classic films and television shows. The possibilities are endless. Image restoration is not limited to GANs, but at the moment seems to be one of the most widely used methods. Fairly new methods like ARGAN (Artifact Reduction GAN) and FD-GAN (Face De-Morphing GAN or Feature Distilling GAN) are showing a lot of promise. By the time I’m finished with this interview, there could be three or more others that will surpass these.  ARGAN is similar and can work with SRGAN to aid in image reconstruction. FD-GAN can be used to work with human position images, creating different poses from a totally different pose. This has any number of possibilities from simple fashion shots too, again, photo-realistic images for games, movies and television shows. Find more about image restoration from Chapter 7 of my book. GANs are labeled as innovative due to its ability to generate fake data that looks real. The latest developments in GANs allows it to generate high-dimensional fake data or image video that can easily go undetected. What is your take on the ethical issues surrounding GANs? Don’t you think developers should target creating GANs that will be good for humanity rather than developing scary AI capabilities? Good question. However, the same question has been asked about almost every advance in technology since rainbows were in black and white. Take, for example, the discussion in Chapter 6 where we use CycleGAN to create van Gogh like images. As I was running the code we present, I was constantly amazed by how well the Generator kept coming up with better fakes that looked more and more like they were done by the Master. Yes, there is always the potential for using the technology for “wrong” purposes. That has always been the case. We already have AI that can create images that can fool talent scouts and fake news stories. J. Hector Fezandie said back in 1894, "with great power comes great responsibility" and was repeated by Peter Parker’s Uncle Ben thanks to Stan Lee. It was very true then and is still just as true. How do you think GANs will be contributing to AI innovations in the future? Are you expecting/excited to see an implementation of GANs in a particular area/domain in the coming years? 5 years ago, GANs were pretty much unknown and were only in the very early stages of reality.  At that point, no one knew the multitude of directions that GANs would head towards. I can’t begin to imagine where GANs will take us in the next two years, much let the far future. I can’t imagine any area that wouldn’t benefit from the use of GANs. One of the subjects we wanted to cover was facial recognition and age progression, but we couldn’t get permission to use the dataset. It’s a shame, but that will be one of the areas that GANs will shine in for the future. Things like biomedical research could be one area that might really be helped by GANs. I hate to keep using this phrase, but the possibilities are unlimited. If you want to learn how to build, train, and optimize next-generation GAN models and use them to solve a variety of real-world problems, read Greg’s book ‘Hands-On Generative Adversarial Networks with PyTorch 1.x’. This book highlights all the key improvements in GANs over generative models and will help guide you to make the GANs with the help of hands-on examples. What are generative adversarial networks (GANs) and how do they work? [Video] Generative Adversarial Networks: Generate images using Keras GAN [Tutorial] What you need to know about Generative Adversarial Networks ICLR 2019 Highlights: Algorithmic fairness, AI for social good, climate change, protein structures, GAN magic, adversarial ML and much more Interpretation of Functional APIs in Deep Neural Networks by Rowel Atienza
Read more
  • 0
  • 0
  • 5042
article-image-prof-rowel-atienza-discusses-the-intuition-behind-deep-learning-techniques-advances-in-gans
Packt Editorial Staff
30 Sep 2019
6 min read
Save for later

Prof. Rowel Atienza discusses the intuition behind deep learning, advances in GANs & techniques to create cutting-edge AI models

Packt Editorial Staff
30 Sep 2019
6 min read
In recent years, deep learning has made unprecedented progress in vision, speech, natural language processing and understanding, and other areas of data science. Developments in deep learning techniques, including GANs, variational autoencoders and deep reinforcement learning, are creating impressive AI results. For example, DeepMind's AlphaGo Zero became a game changer in AI research when it beat world champions in the game of Go. In this interview, Professor Rowel Atienza, author of the book Advanced Deep Learning with Keras talks about the recent developments in the field of deep learning. This book is a comprehensive guide to the advanced deep learning techniques available today, so you can create your own cutting-edge AI. This book strikes a balance between advanced concepts in deep learning and practical implementations with Keras. Key takeaways from the interview The intuition of deep learning is built on the fact that the deeper the network gets, the more feature representations the network learns in order to solve complex real-world problems. The objective of deep learning is to enable agents to be more robust to unforeseen events and to lessen the dependency on huge data. Advances in GANs enable us to generate high-dimensional fake data such as high-resolution images or videos that look very convincing. Deep learning tackles the curse of dimensionality by finding efficient data structures and layers that could represent complex data in the most efficient manner. The interview in detail What is the intuition behind deep learning? What are the recent developments in deep learning? Rowel Atienza: Deep learning is built on the intuition that the deeper the network gets, the more feature representations the network learns in order to solve complex real-world problems. Unlike machine learning, deep learning learns these features automatically from data in different degrees of supervision. There are many recent developments in deep learning. There are advances on graph neural networks because people are realizing the limits of NLP (Natural Language Processing), CNN (Convolution Neural Networks), and RNN (Recurrent Neural Networks) in representing more complex data structures such as social network, 3D shapes, molecular structures, etc. Implementing the causality in reasoning on data is another area of strong interest. Deep learning is strong on correlation not on discovering causality in data. Meta learning or learning to learn is also another area of interest. The objective is to enable agents to be more robust to unforeseen events and to lessen the dependency on huge data. What are different deep learning techniques to create successful AI? RA: A successful AI is dependent on two things: 1) deep domain knowledge and 2) deep understanding of state of the art techniques that will work on the domain problem. Domain knowledge comes from someone who is very familiar with the domain problem. This person is not necessarily knowledgeable in AI. This domain knowledge is then modelled in AI to automate the process of problem solving. How deep learning tackles the curse of dimensionality? RA: One of the goals of deep learning is to keep on finding efficient data structures and layers that could represent complex data in the most efficient manner. For example, geometric deep learning is able to circumvent the limitations of representing and learning from 3D data by avoiding inefficient 3D convolutions. There is still so much to be done in this space. What is autoencoders? What is the need of autoencoders in deep learning? How do you create an autoencoder? RA: Autoencoders compress high dimensionality data into low dimensionality code without losing important information. Low-dimensional code is suitable for further processing by other deep learning models such as in generative models like GANs and VAEs. Autoencoder can easily be implemented using two networks, an encoder and decoder. The depth, width, and type of layers are dependent on the original data to be encoded. Why are GANs so innovative? RA: GANs are innovative since they are good in generating fake data that look real. It is something that is hard to accomplish using other generative models. The advances in GANs enable us to generate high-dimensional fake data such as high resolution image or video that look very convincing. Tell us a little bit about this book? What makes this book necessary? What gap does it fill? RA: Advanced Deep Learning with Keras focuses on recent advances on deep learning It starts with a quick review of deep learning concepts (NLP, CNN, RNN). The discussions on deep neural networks, autoencoders, generative adversarial network (GAN), variational autoencoders (VAE), and deep reinforcement learning (DRL) follow. The book is important for everyone who would like to understand advanced concepts on deep learning and their corresponding implementation in Keras. The current version has in depth focus on generative models (autoencoders, GANs, VAEs) that could be used in-practical setting. The DRL explains the core concepts of value-based and policy-based methods in reinforcement learning and the corresponding working implementations in Keras which are difficult to make them right. About the Book Advanced Deep Learning with Keras is a comprehensive guide to the advanced deep learning techniques available today, so you can create your own cutting-edge AI. Using Keras as an open-source deep learning library, you'll find hands-on projects throughout that show you how to create more effective AI with the latest techniques. About the Author Rowel Atienza is an Associate Professor at the Electrical and Electronics Engineering Institute of the University of the Philippines, Diliman. He holds the Dado and Maria Banatao Institute Professorial Chair in Artificial Intelligence. Rowel has been fascinated with intelligent robots since he graduated from the University of the Philippines. He received his MEng from the National University of Singapore for his work on an AI-enhanced four-legged robot. He finished his Ph.D. at The Australian National University for his contribution to the field of active gaze tracking for human-robot interaction. Deep learning models have massive carbon footprints, can photonic chips help reduce power consumption? Machine learning experts on how we can use machine learning to mitigate and adapt to the changing climate Google launches beta version of Deep Learning Containers for developing, testing and deploying ML applications
Read more
  • 0
  • 0
  • 4357

article-image-listen-researcher-rowel-atienza-discusses-artificial-intelligence-deep-learning-and-why-we-dont-need-to-fear-a-robot-ruled-future-podcast
Richard Gall
08 Apr 2019
2 min read
Save for later

Listen: researcher Rowel Atienza discusses artificial intelligence, deep learning, and why we don't need to fear a robot-ruled future [Podcast]

Richard Gall
08 Apr 2019
2 min read
Artificial intelligence threats are regularly talked up by the media. This is largely because the area is widely misunderstood. The robot revolution and dangerous algorithms are, unfortunately, much sexier than math and statistics. Artificial intelligence isn't really that scary. And while it does pose many challenges for society, it's essential to remember that these are practical challenges that don't exist in some abstract realm. They are rather engineering and ethical problems that we can all help solve. In this edition of the Packt podcast, we spoke to Rowel Atienza about the reality of artificial intelligence. In particular we wanted to understand the practical realities behind the buzz. As an Associate Professor at the University of the Philipines researching numerous different aspects of artificial intelligence - and author of Advanced Deep Learning with Keras  - he's someone with experience and insight on what really matters across the field. Getting past the artificial intelligence hype with Rowel Atienza In the episode we discussed: The distinction between AI, machine learning and deep learning Why artificial intelligence is so hot right now The key machine learning frameworks - TensorFlow, PyTorch, and Keras How they compare and why Rowel loves Keras The importance of ethics and transparency Essential skills for someone starting or building a career in the field How far are we really are from AGI Listen here:  https://soundcloud.com/packt-podcasts/were-still-very-far-from-robots-taking-over-society-rowel-atienza-on-deep-learning-and-ai
Read more
  • 0
  • 0
  • 7326

article-image-deep-learning-is-not-an-optimum-solution-for-every-problem-faced-an-interview-with-valentino-zocca
Sunith Shetty
14 Nov 2018
11 min read
Save for later

“Deep learning is not an optimum solution for every problem faced”: An interview with Valentino Zocca

Sunith Shetty
14 Nov 2018
11 min read
Over the past few years, we have seen some advanced technologies in artificial intelligence shaping human life. Deep learning (DL) has become the main driving force in bringing new innovations in almost every industry. We are sure to continue to see DL everywhere. Most of the companies including startups are already integrating deep learning into their own day-to-day process. Deep learning techniques and algorithms have made building advanced neural networks practically feasible, thanks to high-level open source libraries such as TensorFlow, Keras, PyTorch and more. We recently interviewed Valentino Zocca, a deep learning expert and the author of the book, Python Deep Learning. Valentino explains why deep learning is getting so much hype, and what's the roadmap ahead in terms of new technologies and libraries. He will also talks about how major vendors and tech-savvy startups adopt deep learning within their organization. Being a consultant and an active developer he is expecting a better approach than back-propagation for carrying out various deep learning tasks. Author’s Bio Valentino Zocca graduated with a Ph.D. in mathematics from the University of Maryland, USA, with a dissertation in symplectic geometry, after having graduated with a laurel in mathematics from the University of Rome. He spent a semester at the University of Warwick. After a post-doc in Paris, Valentino started working on high-tech projects in the Washington, D.C. area and played a central role in the design, development, and realization of an advanced stereo 3D Earth visualization software with head tracking at Autometric, a company later bought by Boeing. At Boeing, he developed many mathematical algorithms and predictive models, and using Hadoop, he has also automated several satellite-imagery visualization programs. He has since become an expert on machine learning and deep learning and has worked at the U.S. Census Bureau and as an independent consultant both in the US and in Italy. He has also held seminars on the subject of machine learning and deep learning in Milan and New York. Currently, Valentino lives in New York and works as an independent consultant to a large financial company, where he develops econometric models and uses machine learning and deep learning to create predictive models. But he often travels back to Rome and Milan to visit his family and friends. Key Takeaways Deep learning is one of the most adopted techniques used in image and speech recognition and anomaly detection research and development areas. Deep learning is not the optimum solution for every problem faced. Based on the complexity of the challenge, the neural network building can be tricky. Open-source tools will continue to be in the race when compared to enterprise software. More and more features are expected to improve on providing efficient and powerful deep learning solutions. Deep learning is used as a tool rather than a solution across organizations. The tool usage can differ based on the problem faced. Emerging specialized chips expected to bring more developments in deep learning to mobile, IoT and security domain. Valentino Zocca states We have a quantity vs. quality problem. We will be requiring better paradigms and approaches in the future which can be improved through research driven innovative solutions instead of relying on hardware solutions. We can make faster machines, but our goal is really to make more intelligent machines for performing accelerated deep learning and distributed training. Full Interview Deep learning is as much infamous as it is famous in the machine learning community with camps supporting and opposing the use of DL passionately. Where do you fall on this spectrum? If you were given a chance to convince the rival camp with 5-10 points on your stand about DL, what would your pitch be like? The reality is that Deep Learning techniques have their own advantages and disadvantages. The areas where Deep Learning clearly outperforms most other machine learning techniques are in image and speech recognition and anomaly detection. One of the reasons why Deep Learning does so much better is that these problems can be decomposed into a hierarchical set of increasingly complex structures, and, in multi-layer neural nets, each layer learns these structures at different levels of complexity. For example, an image recognition, the first layers will learn about the lines and edges in the image. The subsequent layers will learn how these lines and edges get together to form more complex shapes, like the eyes of an animal, and finally the last layers will learn how these more complex shapes form the final image. However, not every problem can suitably be decomposed using this hierarchical approach. Another issue with Deep Learning is that it is not yet completely understood how it works, and some areas, for example, banking, that are heavily regulated, may not be able to easily justify their predictions. Finally, many neural nets may require a heavier computational load than other classical machine learning techniques. Therefore, the reality is that one still needs a proficient machine learning expert who deeply understands the functioning of each approach and can make the best decision depending on each problem. Deep Learning is not, at the moment, a complete solution to any problem, and, in general, there can be no definite side to pick, and it really depends on the problem at hand. Deep learning can conquer tough challenges, no doubt. However, there are many common myths and realities around deep learning. Would you like to give your supporting reasoning on whether the following statements are myth or fact? You need to be a machine learning expert or a math geek to build deep learning models We need powerful hardware resources to use deep learning Deep learning models are always learning, they improve with new data automagically Deep learning is a black box, so we should avoid using it in production environments or in real-world applications. Deep learning is doomed to fail. It will be replaced eventually by data sparse, resource economic learning methods like meta-learning or reinforcement learning. Deep learning is going to be central to the progress of AGI (artificial general intelligence) research Deep Learning has become almost a buzzword, therefore a lot of people are talking about it, sometimes misunderstanding how it works. People hear the word DL together with "it beats the best player at go", "it can recognize things better than humans" etc., and people think that deep learning is a mature technology that can solve any problem. In actuality, deep learning is a mature technology only for some specific problems, you do not solve everything with deep learning and yet at times, whatever the problem, I hear people asking me "can't you use deep learning for it?" The truth is that we have lots of libraries ready to use for deep learning. For example, you don’t need to be a machine learning expert or a math geek to build simple deep learning models for run-of-the-mill problems, but in order to solve for some of the challenges that less common issues may present, a good understanding of how a neural network works may indeed be very helpful. Like everything, you can find a grain of truth in each of those statements, but they should not be taken at face value. With MLaaS being provided by many vendors from Google to AWS to Microsoft, deep learning is gaining widespread adoption not just within large organizations but also by data-savvy startups. How do you view this trend? More specifically, is deep learning being used differently by these two types of organizations? If so, what could be some key reasons? Deep Learning is not a monolithic approach. We have different types of networks, ANNs, CNNs, LSTMs, RNNs, etc. Honestly, it makes little sense to ask if DL is being used differently by different organizations. Deep Learning is a tool, not a solution, and like all tools it should be used differently depending on the problem at hand, not depending on who is using it. There are many open source tools and enterprise software (especially the ones which claim you don't need to code much) in the race. Do you think this can be the future where more and more people will opt for ready-to-use (MLaaS) enterprise backed cognitive tools like IBM Watson rather than open-source tools? This holds true for everything. At the beginning of the internet, people would write their own HTML code for their web pages, now we use tools who do most of the work for us. But if we want something to stand-out we need a professional designer. The more a technology matures, the more ready-to-use tools will be available, but that does not mean that we will never need professional experts to improve on those tools and provide specialized solutions. Deep learning is now making inroads to mobile, IoT and security domain as well. What makes DL great for these areas? What are some challenges you see while applying DL in these new domains? I do not have much experience with DL in mobiles, but that is clearly a direction that is becoming increasingly important. I believe we can address these new domains by building specialized chips. Deep learning is a deeply researched topic within machine learning and AI communities. Every year brings us new techniques from neural nets to GANs, to capsule networks that then get widely adopted both in research and in real-world applications. What are some cutting-edge techniques you foresee getting public attention in deep learning in 2018 and in the near future? And why? I am not sure we will see anything new in 2018, but I am a big supporter of the idea that we need a better paradigm that can excel more at inductive reasoning rather than just deductive reasoning. At the end of last year, even DL pioneer Geoff Hinton admitted that we need a better approach than back-propagation, however, I doubt we will see anything new coming out this year, it will take some time. We keep hearing noteworthy developments in AI and deep learning by DeepMind and OpenAI. Do you think they have the required armory to revolutionize how deep learning is performed? What are some key challenges for such deep learning innovators? As I mentioned before, we need a better paradigm, but what this paradigm is, nobody knows. Gary Marcus is a strong proponent of introducing more structure in our networks, and I do concur with him, however, it is not easy to define what that should be. Many people want to use the brain as a model, but computers are not biological structures, and if we had tried to build airplanes by mimicking how a bird flies we would not have gone very far. I think we need a clean break and a new approach, I do not think we can go very far by simply refining and improving what we have. Improvement in processing capabilities and the availability of custom hardware have propelled deep learning into production-ready environments in recent years. Can we expect more chips and other hardware improvements in the coming years for GPU accelerated deep learning and distributed training? What other supporting factors will facilitate the growth of deep learning? Once again, foreseeing the future is not easy, however, as these questions are related, I think only so much can be gained by improving chips and GPUs. We have a quantity vs. quality problem. We can improve quantity (of speed, memory, etc.) through hardware improvements, but the real problem is that we need a real quality improvement, better paradigms, and approaches, that needs to be achieved through research and not with hardware solutions. We can make faster machines, but our goal is really to make more intelligent machines. A child can learn by seeing just a few examples, we should be able to create an approach that allows a machine to also learn from few examples, not by cramming millions of examples in a short time. Would you like to add anything more to our readers? Deep Learning is a fascinating discipline, and I would encourage anyone who wanted to learn more about it to approach it as a research project, without underestimating his or her own creativity and intuition. We need new ideas. If you found this interview to be interesting, make sure you check out other insightful interviews on a range of topics: Blockchain can solve tech’s trust issues – Imran Bashir “Tableau is the most powerful and secure end-to-end analytics platform”: An interview with Joshua Milligan “Pandas is an effective tool to explore and analyze data”: An interview with Theodore Petrou
Read more
  • 0
  • 0
  • 7574
article-image-with-python-you-can-create-self-explanatory-concise-and-engaging-data-visuals-and-present-insights-that-impact-your-business-tim-grosmann-and-mario-dobler-interview
Savia Lobo
10 Nov 2018
6 min read
Save for later

“With Python, you can create self-explanatory, concise, and engaging data visuals, and present insights that impact your business” - Tim Großmann and Mario Döbler [Interview]

Savia Lobo
10 Nov 2018
6 min read
Data today is the world’s most important resource. However, without properly visualizing your data to discover meaningful insights, it’s useless. Creating visualizations helps in getting a clearer and concise view of the data, making it more tangible for (non-technical) audiences. To further illustrate this, below are questions aimed at giving you an idea why data visualization is so important and why Python should be your choice. In a recent interview, Tim Großmann and Mario Döbler, the authors of the course titled, ‘Data Visualization with Python’, shared with us the importance of Data visualization and why Python is the best fit to carry out Data Visualization. Key Takeaways Data visualization is a great way, and sometimes the only way, to make sense of the constantly growing mountain of data being generated today. With Python, you can create self-explanatory, concise, and engaging data visuals, and present insights that impact your business. Your data visualizations will make information more tangible for the stakeholders while telling them an interesting story. Visualizations are a great tool to transfer your understanding of the data to a less technical co-worker. This builds a faster and better understanding of data. Python is the most requested and used language in the industry. Its ease of use and the speed at which you can manipulate and visualize data, combined with the number of available libraries makes Python the best choice. Full Interview Why is Data Visualization important? What problem is it solving? As the amount of data grows, the need for developers with knowledge of data analytics and especially data visualization spikes. In recent years we have experienced an exponential growth of data. Currently, the amount of data doubles every two years. For example, more than eight thousand tweets are sent per second; and more than eight hundred photos are uploaded to Instagram per second. To cope with the large amounts of data, visualization is essential to make it more accessible and understandable. Everyone has heard of the saying that a picture is worth a thousand words. Humans process visual data better and faster than any other type of data. Another important point is that data is not necessarily the same as information. Often people aren’t interested in the data, but in some information hidden in the data. Data visualization is a great tool to discover the hidden patterns and reveal the relevant information. It bridges the gap between quantitative data and human reasoning, or in other words, visualization turns data into meaningful information. What other similar solutions or tools are out there? Why is Python better? Data visualizations can be created in many ways using many different tools. MATLAB and R are two of the available languages that are heavily used in the field of data science and data visualization. There are also some non-coding tools like Tableau which are used to quickly create some basic visualizations. However, Python is the most requested and used language in the industry. Its ease of use and the speed at which you can manipulate and visualize data, combined with the number of available libraries makes Python the best choice. In addition to all the mentioned perks, Python has an incredibly big ecosystem with thousands of active developers. Python really differs in a way that allows users to also build their own small additions to the tools they use, if necessary. There are examples of pretty much everything online for you to use, modify, and learn from. How can Data Visualization help developers? Give specific examples of how it can solve a problem. Working with, and especially understanding, large amounts of data can be a hard task. Without visualizations, this might even be impossible for some datasets. Especially if you need to transfer your understanding of the data to a less technical co-worker, visualizations are a great tool for a faster and better understanding of data. In general, looking at your data visualized often speaks more than a thousand words. Imagine getting a dataset which only consists of numerical columns. Getting some good insights into this data without visualizations is impossible. However, even with some simple plots, you can often improve your understanding of even the most difficult datasets. Think back to the last time you had to give a presentation about your findings and all you had was a table with numerical values in it. You understood it, but your colleagues sat there and scratched their heads. Instead had you created some simple visualizations, you would have impressed the entire team with your results. What are some best practices for learning/using Data Visualization with Python? Some of the best practices you should keep in mind while visualizing data with Python are: Start looking and experimenting with examples Start from scratch and build on it Make full use of documentation Use every opportunity you have with data to visualize it To know more about the best practices in detail, read our detailed notes on 4 tips for learning Data Visualization with Python. What are some myths/misconceptions surrounding Data Visualization with Python? Data visualizations are just for data scientists Its technologies are difficult to learn Data visualization isn’t needed for data insights Data visualization takes a lot of time Read about these myths in detail in our article, ‘Python Data Visualization myths you should know about’. Data visualization in combination with Python is an essential skill when working with data. When properly utilized, it is a powerful combination that not only enables you to get better insights into your data but also gives you the tool to communicate results better. Data nowadays is everywhere so developers of every discipline should be able to work with it and understand it. About the authors Tim Großmann Tim Großmann is a CS student with interest in diverse topics ranging from AI to IoT. He previously worked at the Bosch Center for Artificial Intelligence in Silicon Valley in the field of big data engineering. He’s highly involved in different Open Source projects and actively speaks at meetups and conferences about his projects and experiences. Mario Döbler Mario Döbler is a graduate student with a focus in deep learning and AI. He previously worked at the Bosch Center for Artificial Intelligence in Silicon Valley in the field of deep learning, using state-of-the-art algorithms to develop cutting-edge products. Currently, he dedicates himself to apply deep learning to medical data to make health care accessible to everyone. Setting up Apache Druid in Hadoop for Data visualizations [Tutorial] 8 ways to improve your data visualizations Getting started with Data Visualization in Tableau  
Read more
  • 0
  • 0
  • 7224

article-image-instead-of-data-scientists-working-on-their-models-and-advancing-ai-they-are-spending-their-time-doing-deepops-work-missinglink-ceo-yogi-taguri-interview
Amey Varangaonkar
08 Nov 2018
10 min read
Save for later

“Instead of data scientists working on their models and advancing AI, they are spending their time doing DeepOps work”, MissingLink CEO, Yosi Taguri [Interview]

Amey Varangaonkar
08 Nov 2018
10 min read
Machine learning has shown immense promise across domains and industries over the recent years. From helping with the diagnosis of serious ailments to powering autonomous vehicles, machine learning is finding useful applications across a spectrum of industries. However, the actual process of delivering business outcomes using machine learning currently takes too long and is too expensive, forcing some businesses to look for other less burdensome alternatives. MissingLink.ai is a recently-launched platform to fix just this problem. It enables data scientists to spend less time on the grunt work by automating and streamlining the entire machine learning cycle, giving them more time to apply actionable insights gleaned from the data. Key Takeaways Processing and managing the sheer volume of data is one of the key challenges that today’s AI tools face Yosi thinks the idea of companies creating their own machine learning infrastructure doesn’t make a lot of sense. Data professionals should be focusing on more important problems within their organizations by letting the platform take care of the grunt work. MissingLink.ai is an innovative AI platform born out of the need to simplify AI development, by taking away the common, menial data processing tasks from data scientists and allowing them to focus on the bigger data-related issues; through experiment management, data management and resource management. MissingLink is a part of the Samsung NEXT product development team that aims to help businesses automate and accelerate their projects using machine learning We had the privilege of interviewing Mr. Yosi Taguri, the founder and CEO of MissingLink, to know more about the platform and how it enables more effective deep learning. What are the biggest challenges that companies come across when trying to implement a robust Machine Learning/Deep Learning pipeline in their organization? How does it affect their business? The biggest challenge, simply put, is that today’s AI tools can’t keep up with the amount of data being produced. And it’s only going to get more challenging from here! As datasets continue to grow, they will require more and more compute power, which means we risk falling farther behind unless we change the tools we’re using. While everyone is talking about the promise of machine learning, the truth is that today, assessing data is still too time-consuming and too expensive. Engineers are spending all their time managing the sheer volume of data, rather than actually learning from it and being empowered to make changes. Let’s talk about MissingLink.ai, the platform you and your team have launched for accelerating deep learning across businesses. Why the name MissingLink? What was the motivation to launch this platform? The name is actually a funny story, and it ties pretty neatly into why we created the platform. When we were starting out three years ago, deep learning was still a relatively new concept and my team and I were working hard to master the intricacies of it. As engineers, we primarily worked with code, so to be able to solve problems with data was a fascinating new challenge for us. We quickly realized that deep learning is really hard and moves very, very slow. So we set out to solve that problem of how to build really smart machines really fast. By comparison, we thought of it through the lens of software development. Our goal was to accelerate from a glacial pace to building machine learning algorithms faster -- because we felt that there was something missing, a missing link if you will. MissingLink is a part of the growing Samsung NEXT product development team. How does it feel? What role do you think MissingLink will play in Samsung NEXT’s plans and vision going forward? Samsung NEXT’s broader mission is to help startups reach their full potential and achieve their goals. More specifically, Samsung NEXT discovers and backs the engineers, innovators, builders, and entrepreneurs who will help Samsung define the future of software and services. The Samsung NEXT product development team is focused on building software and services that take advantage of and accelerate opportunities related to some of the biggest shifts in technology including automation, supply and demand, and interfaces. This will require hardware and software to seamlessly come together. Over the past few years, nearly all startups are leveraging AI for some component of their business, yet practical progress has been slower than promised. MissingLink is a foundational tool to enable the progress of these big changes, helping entrepreneurs with great use cases for machine learning to accelerate their projects. Could you give us the key features of Missinglink.ai that make it stand out from the other AI platforms available out there? How will it help data scientists and ML engineers build robust, efficient machine learning models? First off, MissingLink.ai is the most comprehensive AI platform out there. It handles the entire deep learning lifecycle and all its elements, including code, data, experiments, and resources. I’d say that our top features include: Experiment Management: See and compare the entire history of experiments. MissingLink.ai auto-documents every aspect Data Management: A unique data store tracks data versions used in every experiment, streams data, caches it locally and only syncs changes Resources Management: Manages your resources with no extra infrastructure costs using your AWS or other cloud credentials. It grows and shrinks your cloud resources as needed. These features, together with our intuitive interface, really put data scientists and engineers in the driver's seat when creating AI models. Now they can have more control and spend less energy repeating experiments, giving them more time to focus on what is important. Your press release on the release of MissingLink states “the actual process of delivering business outcomes currently takes too long and it is too expensive. MissingLink.ai was born out of a desire to fix that.” Could you please elaborate on how MissingLink makes deep learning less expensive and more accessible? Companies are currently spending too much time and devoting too many resources to the menial tasks that are necessary for building machine learning models. The more time data scientists spend on tasks like spinning machines, copying files and DevOps, the more money that a company is wasting. MissingLink changes that through the introduction of something we’re calling DeepOps or deep learning operations, which allows data scientists to focus on data science and let the machine take care of the rest. It’s like DevOps where the role is about how to make the process of software development more efficient and productionalized, but the difference is no one has been filling this role and it’s different enough that its very specific to the task of deep learning. Today, instead of data scientists working on their models and advancing AI, they are spending their time doing this DeepOps work. MissingLink reduces load time and facilitates easy data exploration by eliminating the need to copy files through data-management in a version-aware data store. Most of the businesses are moving their operations on to the cloud these days, with AWS, Azure, GCP, etc. being their preferred cloud solutions. These platforms have sophisticated AI offerings of their own. Do you see AI platforms such as MissingLink.ai as a competition to these vendors, or can the two work collaboratively? I wouldn’t call cloud companies our competitors; we don’t provide the cloud services they do, and they don’t provide the DeepOps service that we do. Yes, we all are trying to simplify AI, but we’re going about it in very different ways. We can actually use a customer’s public cloud provider as the infrastructure to run the MissingLink.ai platform. If customers provide us with their cloud credentials, we can even manage this for them directly. Concepts such as Reinforcement Learning and Deep Learning for Mobile are getting a lot of traction these days, and have moved out of the research phase into the application/implementation phase. Soon, they might start finding extensive business applications as well. Are there plans to incorporate these tools and techniques in the platform in the near future? We support all forms of Deep Learning, including Reinforcement Learning. On the Deep Learning for Mobile side, we think the Edge is going to be a big thing as more and more developers around the world are exposed to Deep Learning. We plan to support it early next year. Currently, data privacy and AI ethics have become a focal point of every company’s AI strategy. We see tech conglomerates increasingly coming under the scanner for ignoring these key aspects in their products and services. This is giving rise to an alternate movement in AI, with privacy and ethics-centric projects like Deon, Vivaldi, and Tim Berners-Lee’s Solid. How does MissingLink approach the topics of privacy, user consent, and AI ethics? Are there processes/tools or principles in place in the MissingLink ecosystem or development teams that balance these concerns? When we started MissingLink we understood that data is the most sensitive part of Deep Learning. It is the new IP. Companies spend 80% of their time attending to data, refining it, tagging it and storing it, and therefore are reluctant to upload it to a 3rd party vendor. We have built MissingLink with that in mind - our solution allows customers to simply point us in the direction of where their data is stored internally, and without moving it or having to access it as a SaaS solution we are able to help them manage it by enabling version management as they do with code. Then we can stream it directly to the machines that need the data for processing and document which data was used for reproducibility. Finally, where do you see machine learning and deep learning heading in the near future? Do you foresee a change in the way data professionals work today? How will platforms like MissingLink.ai change the current trend of working? Right now, companies are creating their own machine learning infrastructure - and that doesn’t make sense. Data professionals can and should be focusing on more important problems within their organizations. Platforms like MissingLink.ai free data scientists from the grunt work it takes to upkeep the infrastructure, so they can focus on bigger picture issues. This is the work that is not only more rewarding for engineers to work on, but also creates the unique value that companies need to compete.  We’re excited to help empower more data professionals to focus on the work that actually matters. It was wonderful talking to you, and this was a very insightful discussion. Thanks a lot for your time, and all the best with MissingLink.ai! Read more Michelangelo PyML: Introducing Uber’s platform for rapid machine learning development Tesseract version 4.0 releases with new LSTM based engine, and an updated build system Baidu releases a new AI translation system, STACL, that can do simultaneous interpretation
Read more
  • 0
  • 0
  • 5813