Ml compiler смотреть последние обновления за сегодня на .
Lex Fridman Podcast full episode: 🤍 Please support this podcast by checking out our sponsors: - Blinkist: 🤍 and use code LEX to get a free week of premium - Neuro: 🤍 and use code LEX to get 15% off - MasterClass: 🤍 to get 15% off annual sub - Cash App: 🤍 and use code LexPodcast to get $10 PODCAST INFO: Podcast website: 🤍 Apple Podcasts: 🤍 Spotify: 🤍 RSS: 🤍 Full episodes playlist: 🤍 Clips playlist: 🤍 CONNECT: - Subscribe to this YouTube channel - Twitter: 🤍 - LinkedIn: 🤍 - Facebook: 🤍 - Instagram: 🤍 - Medium: 🤍 - Support on Patreon: 🤍
Exploring ML Compiler Optimizations with microTVM Gavin UBERTI, Software Engineer, OctoML Deep learning compilers can use optimization passes to make TinyML models run much faster. But what optimizations do they actually perform? In this talk, we’ll use Apache TVM to compile a MobileNetV1 model for Cortex-M microcontrollers. We’ll look inside its intermediate representations, and watch how they change when optimizations are applied. We’ll see how convolution kernels are tailored for the device, how quantization parameters are folded into subsequent operators, and how layouts are rewritten on the fly.
Episode 22 of the Stanford MLSys Seminar Series! Reshaping the ML software bedrock with compilers Speaker: Jason Knight Abstract: The rate of change for ML software, hardware, and algorithms improves our lives daily, but how sturdy are the foundations we rely on? From my experience at one of the first ML accelerator startups (Nervana), applying ML to biology and medicine, leading the ML SW product team at Intel, and then co-founding OctoML. I'll describe: 1) The pains of developing ML SW stacks for CPUs, GPUs and accelerators, and how these pains radiate outwards to both practitioners and hardware vendors, 2) How that led me to find the Apache TVM project, what it is, and why it matters, 3) Challenges and opportunities ahead ML compilation and TVM specifically, and what it can enable for ML end users everywhere. Speaker bio: Jason Knight is co-founder and CPO at OctoML building the machine learning acceleration platform for deploying ML anywhere. From the founders of the Apache TVM project, OctoML uses machine learning to generate efficient binaries for ML model deployment on any hardware. Before starting OctoML, Jason previously drove Intel’s AI software strategy, built large scale human sequencing data pipelines in the biotech industry, and earned a PhD in machine learning and computational biology. The Stanford MLSys Seminar is hosted by Dan Fu, Karan Goel, Fiodar Kazhamiaka, and Piero Molino, Chris Ré, and Matei Zaharia. Twitter: 🤍 🤍 🤍 Check out our website for the schedule: 🤍 Join our mailing list to get weekly updates: 🤍 #machinelearning #ai #artificialintelligence #systems #mlsys #computerscience #stanford #octoml #compilers
Talk given on Oct 21, 2020 for the internal Harvard offering of the Intro to TinyML course. Dr. Thierry Moreau is the co-founder of OctoML Inc., a Seattle-based startup that applies state of the art ML-based techniques to make deep learning models faster and easier to put into production. Thierry has been a key contributor to Apache TVM, the open source machine learning compiler that started at University of Washington, where Thierry got his Ph.D. TVM aims to make deep learning optimization and deployment easier across a wide range of hardware backends. Today, Thierry leads a team of engineers at OctoML that leverages TVM in order to seamlessly deploy ML models on various novel hardware platforms ranging from data-center GPUs down to accelerator-rich smartphone SoCs and low-power microcontrollers. See more: 🤍
From the "Practical AI" podcast. Listen 👉 🤍 Subscribe for more! 👇 Apple: 🤍 Spotify: 🤍 Android: 🤍 Overcast: 🤍 Email: 🤍 Twitter: 🤍 #ai #artificialintelligence #machinelearning #datascience #deeplearning #ml #mlops #podcast #nlp #dataengineering
This talk will describe MLIR - machine learning compiler infrastructure for TensorFlow and explain how it helps TensorFlow to scale faster to meet the needs of rapidly evolving machine learning software and hardware. Speaker: Jacques Pienaar - Software Engineer Links: MLIR project site (documentation, tutorial, links to source code, discussion forums) → 🤍 TensorFlow MLIR SIG → 🤍 Watch all TensorFlow Dev Summit 2020 sessions → 🤍 Subscribe to the TensorFlow YouTube channel! → 🤍 event: TensorFlow Dev Summit 2020; re_ty: Publish; product: TensorFlow - MLIR; fullname: Jacques Pienaar;
A Tensor Compiler Approach for One-size-fits-all ML Prediction Serving Supun Nakandala, UC San Diego; Karla Saur, Microsoft; Gyeong-In Yu, Seoul National University; Konstantinos Karanasos, Carlo Curino, Markus Weimer, and Matteo Interlandi, Microsoft Machine Learning (ML) adoption in the enterprise requires simpler and more efficient software infrastructure—the bespoke solutions typical in large web companies are simply untenable. Model scoring, the process of obtaining predictions from a trained model over new data, is a primary contributor to infrastructure complexity and cost as models are trained once but used many times. In this paper we propose Hummingbird, a novel approach to model scoring, which compiles featurization operators and traditional ML models (e.g., decision trees) into a small set of tensor operations. This approach inherently reduces infrastructure complexity and directly leverages existing investments in Neural Network compilers and runtimes to generate efficient computations for both CPU and hardware accelerators. Our performance results are intriguing: despite replacing imperative computations (e.g., tree traversals) with tensor computation abstractions, Hummingbird is competitive and often outperforms hand-crafted kernels on micro-benchmarks on both CPU and GPU, while enabling seamless end-to-end acceleration of ML pipelines. We have released Hummingbird as open source. View the full OSDI '20 program at 🤍
Speaker: Mangpo Phothilimthana Research scientist 🤍 Google Brain Thursday January 12, 16:00-17:00 Search-based techniques have been demonstrated effective in solving complex optimization problems that arise in domain-specific compilers for machine learning (ML). Unfortunately, deploying such techniques in production compilers is impeded by several limitations. In this talk, I will present an autotuner for production ML compilers that can tune both graph-level and subgraph-level optimizations at multiple compilation stages. We demonstrate how to incorporate machine learning techniques such as a learned cost model and various learning-based search strategies to reduce autotuning time. Our learned cost model has high accuracy and outperforms a heavily-optimized analytical performance model. In an evaluation across 150 ML training and inference models on Tensor Processing Units (TPUs), the autotuner offers up to 2.4x and an average 5% runtime speedup over the heavily-optimized XLA compiler. I will outline how we deploy the XLA autotuner at datacenter scale to automatically tune the most heavily-used production models in Google’s fleet everyday.
Want to build your own programming language? LLVM is a tool for building and optimizing compilers and forms the backbone of many languages like Rust, Swift, CUDA, C, and C. #compsci #programming #100SecondsOfCode 🔗 Resources LLVM 🤍 LLVM Kaleidoscope Tutorial 🤍 C in 100 Seconds 🤍 Rust in 100 Seconds 🤍 🔥 Get More Content - Upgrade to PRO Upgrade to Fireship PRO at 🤍 Use code lORhwXd2 for 25% off your first payment. 🎨 My Editor Settings - Atom One Dark - vscode-icons - Fira Code Font 🔖 Topics Covered - LLVM tutorial - What is LLVM? - Who created LLVM? - How to build a programming language from scratch - What are the main parts of a compiler? - How do compliers work? - Lexer vs Parser
Originally developed by Facebook, the open source Glow compiler is now available in NXP’s eIQ™ Machine Learning Software Development Environment delivering high performance inferencing for NXP’s i.MX RT series of crossover MCUs. Get started using the Glow neural network compiler running on the i.MX RT1060 evaluation kit. Viewers will be guided through examples while learning how to use this compiler tool to turn a model into a machine executable binary for a target device. This step-by-step guide will help make ML application development easier. The labs and example projects shared in this training can be run on other i.MX RT platforms. NXP is the first semiconductor company to deliver a 2-3x performance jump for MCUs over the standard version of Glow. To learn more please visit 🤍 Follow us on: LinkedIn - 🤍 Twitter - 🤍 facebook - 🤍
A New Match Compiler for Standard ML of New Jersey David MacQueen (University of Chicago (Emeritus)) (presented by John Reppy) This abstract describes an implementation of a match compiler (for SML/NJ) that is the latest in an evolution of designs going back to the mid 1980s. It summarizes a more detailed technical report (in preparation) documenting the match compiler and its source code.
Full title: Breaking Boundaries: Advancements in High-Performance AI/ML through PyTorch's Python Compiler By Sam Gross, Justin Jeffress and Suraj Subramanian Sponsor: Meta As GPUs continue to become faster, PyTorch, one of the most widely used frameworks in AI/ML has faced challenges keeping up with performance demands. To mitigate this, parts of PyTorch have been moved into C. This approach goes against the original intent of PyTorch as a Python-based framework and complicates contributions from the community. The PyTorch development team recognized the need to address these challenges while maintaining PyTorch's Python roots and set ambitious goals to improve performance, decrease memory usage, enable state-of-the-art distributed capabilities, and ensure more PyTorch code is written in Python. To achieve these goals, they developed a Python compiler. Attendees of this talk will get an inside look at how the PyTorch development team approached these challenges and implemented their innovative solution to achieve a 43% speedup in performance. We will discuss the benefits and challenges of this approach, as well as the techniques and technologies used to build the PyTorch Python compiler. This talk will provide valuable insights into the development process of and offer attendees a deeper understanding of how PyTorch continues to evolve and innovate.
Lex Fridman Podcast full episode: 🤍 Please support this podcast by checking out our sponsors: - Eight Sleep: 🤍 to get special savings - BetterHelp: 🤍 to get 10% off - Fundrise: 🤍 - Athletic Greens: 🤍 to get 1 month of fish oil GUEST BIO: Andrej Karpathy is a legendary AI researcher, engineer, and educator. He's the former director of AI at Tesla, a founding member of OpenAI, and an educator at Stanford. PODCAST INFO: Podcast website: 🤍 Apple Podcasts: 🤍 Spotify: 🤍 RSS: 🤍 Full episodes playlist: 🤍 Clips playlist: 🤍 SOCIAL: - Twitter: 🤍 - LinkedIn: 🤍 - Facebook: 🤍 - Instagram: 🤍 - Medium: 🤍 - Reddit: 🤍 - Support on Patreon: 🤍
Albert Cohen's keynote talk for the ISC2020's International Workshop on Machine Learning Hardware. Link to slides: 🤍 Workshop URL: 🤍
Julia is a dynamic general purpose programming language popular for scientific computing and big data analytics. It is extremely fast thanks to its use of a JIT compiler and allows developers to write concise, yet powerful code. #compsci #programming #100SecondsOfCode 🔗 Resources Julia Language 🤍 Why Julia? 🤍 Python in 100 Seconds 🤍 C in 100 Seconds 🤍 🔥 Get More Content - Upgrade to PRO Upgrade to Fireship PRO at 🤍 Use code lORhwXd2 for 25% off your first payment. 🎨 My Editor Settings - Atom One Dark - vscode-icons - Fira Code Font 🔖 Topics Covered - Julia basics tutorial - Julia vs Python - Julia vs R - what is Julia used for? - who created Julia? - parametric types and polymorphism explained - Julia type system explained - is Julia easy to learn?
ML Family Workshop 🤍 ICFP 2014. Gothenburg, Sweden.
In this video, we will give you an overview of our AI Compiler - our execution system. The first piece of the Compiler is the Router - a database of labeling sources, which can be humans, machines, or software functions. The Combiner uses a generative model to combine the various labeling sources. The Trainer uses the Meta AI module and takes the labeled data and trains new ML models. The more data you label, the more machine models label your tasks. The BPU unit trains new humans to keep them motivated. The last steps is the Measurement and Q&A system. If you'd like to find out more about the platform, check out the other videos in the series. If you want to talk to our team, visit super.ai/demo to get started.
tinyML Talks local Webcast - recorded January 19, 2021 "microTVM: a Tensor Compiler for Bare Meta" Andrew Reusch - OctoML TVM is an open source compiler and deployment framework for Tensor-oriented models. It can compile models from popular frameworks (PyTorch, TensorFlow, ONNX, etc) onto a range of devices. microTVM is a recent effort that allows TVM to target bare metal devices. In this talk, we'll explain how microTVM works and give a short demonstration. We'll also discuss future directions for microTVM and ways you can get involved.
In this video, we talk about how why GPU's are better suited for parallelized tasks. We go into how a GPU is better than a CPU at certain tasks. Finally, we setup the NVIDIA CUDA programming packages to use the CUDA API in Visual Studio. GPUs are a great platform to executed code that can take advantage of hyper parallelization. For example, in this video we show the difference between adding vectors on a CPU versus adding vectors on a GPU. By taking advantage of the CUDA parallelization framework, we can do mass addition in parallel. Join me on Discord!: 🤍 Support me on Patreon!: 🤍
[AI-S2oC] 2022.6 Symposium | ML for Low Power Compiler Optimization - 장준서_B.S Student, SNU The performance of the written code greatly depends on the compiler optimizations. While the -O3 options in LLVM and GCC compiler are well known for generating highly optimized codes, recent studies shows that compiler optimization using ML approaches generates faster and more eﬃcient intermediate codes. These approaches utilizes exisiting compiler passes and ﬁnd the optimal compiler optimizations by searching for optimal combinations and orders of passes which maximize performance. In this talk, we will introduce motivations in ML compiler optimizations, especially focusing on studies using RL approaches. We then demonstrate our ongoing research, which aim to develop compiler optimization ML model for generating optimal low power intermediate codes.
Session #2 Title: CompilerGym: Robust, Performant Compiler Optimization Environments for AI Research Authors: Chris Cummins (Facebook), Bram Wasti (Facebook), Jiadong Guo (Facebook), Brandon Cui (Facebook), Jason Ansel (Facebook), Sahir Gomez (Facebook), Somya Jain (Facebook), Jia Liu (Facebook), Olivier Teytaud (Facebook), Benoit Steiner (Facebook), Yuandong Tian (Facebook), Hugh Leather (Facebook)
Mojo is a new LLVM programming language designed as a superset of Python with the low-level performance of C. It is optimized to run on GPUs with CUDA and other exotic hardware for deep learning and Artificial Intelligence. #programming #tech #thecodereport 💬 Chat with Me on Discord 🤍 🔗 Resources - Mojo Language 🤍 - 🤍howardjeremyp Mojo Video 🤍 - LLVM in 100 Seconds 🤍 - Python in 100 Seconds 🤍 - Linear Algebra for Programmers 🤍 🔥 Get More Content - Upgrade to PRO Upgrade at 🤍 Use code YT25 for 25% off PRO access 🎨 My Editor Settings - Atom One Dark - vscode-icons - Fira Code Font 🔖 Topics Covered - What is Mojo Language? - Best Python alternative - Is python slow? - Futuristic AI programming languages - Is mojo a python killer? - Mojo vs Python - Mojo vs Rust - Mojo vs C
Speaker: Chris Cummins (🤍 Venue: Proceedings of the 38th International Conference on Machine Learning (ICML'21) Abstract: Machine learning (ML) is increasingly seen as a viable approach for building compiler optimization heuristics, but many ML methods cannot replicate even the simplest of the data flow analyses that are critical to making good optimization decisions. We posit that if ML cannot do that, then it is insufficiently able to reason about programs. We formulate data flow analyses as supervised learning tasks and introduce a large open dataset of programs and their corresponding labels from several analyses. We use this dataset to benchmark ML methods and show that they struggle on these fundamental program reasoning tasks. We propose ProGraML - Program Graphs for Machine Learning - a language-independent, portable representation of program semantics. ProGraML overcomes the limitations of prior works and yields improved performance on downstream optimization tasks.
In this GroqDay Groq™ Compiler update, learn about our software-hardware co-design in this presentation by Andrew Ling, PhD, Sr. Director, Software Engineering & ML Compiler and Andrew Bitar, Compiler Tech Lead & Manager. Register and watch the full GroqDay event on demand at groq.link/groqday4.
LLVM Social Bangalore is a casual gathering of LLVM hackers. We welcome LLVM, Clang, and other LLVM sub-project hackers. That being said, we are always open to people working in or around compilers not necessarily LLVM. Title: Deep-learning based Compiler Optimizations Abstract: Over the last few years, machine learning, in particular deep learning, has made great strides and has been successfully deployed in many real-world applications. One of the recent applications of deep learning has been in the area of software development and programming itself. In addition, DL-based encoding and optimization has been finding increasing traction among the compiler development and research community. In this talk we will highlight some of the recent work that apply DL techniques to compiler heuristics and optimizations like vectorization and register allocation as well as End2end analyses of programs for predicting certain program characteristics. We will also talk about some of the challenges of using DL for compiler optimizations. Speaker: Dr. Dibyendu Das Speaker's bio: Dr. Dibyendu Das is currently a Sr. Principal Engineer at Intel Corporation. His interest lies in the enablement of AI in compiler algorithms and in using ML/DL algorithms for molecular drug design. Before joining Intel, he headed the compiler team at AMD that developed the AOCC - AMD's premium optimizing compiler that achieved world record SPEC benchmark scores. Early in his career, he spent many years at HP and IBM working on compilers, computer architecture and program performance understanding. Apart from his industry career, he holds a Bachelor's degree in Computer science from Jadavpur University, a Master's degree from IISc and PhD from IIT Kharagpur 00:06 - Introduction & LLVM NEWS 13:27 - Deep Learning Based Compiler Optimization Slides: 🤍
Created using Powtoon Free sign up at 🤍 Create animated videos and animated presentations for free. PowToon is a free tool that allows you to develop cool animated clips and animated presentations for your website, office meeting, sales pitch, nonprofit fundraiser, product launch, video resume, or anything else you could use an animated explainer video. PowToon's animation templates help you create animated presentations and animated explainer videos from scratch. Anyone can produce awesome animations quickly with PowToon, without the cost or hassle other professional animation services require.
Abstract Compilers are the workhorse that bridge the gap between human-readable and machine-executable code. The diversity of modern programs, along with the advent of new and complex hardware architectures, has strained the capabilities of current compilers, making development and maintenance of automatic program optimizations in compilers exceedingly challenging. In spite of this, modern compiler optimizations are still hand-crafted using technology that existed decades ago and usually make optimization decisions considering an abstract machine model. It is high time that we modernize our compiler toolchains using more automated decision procedures to make better optimization decisions while reducing the expertise required to build and maintain compiler optimizations. In this talk, I will show how we can leverage the changes in the computing environment to modernize compiler optimizations, using auto-vectorization (automatic conversion of scalar code into vector code) as an example. First, I will demonstrate how we can take advantage of modern solvers and computing platforms to perform vectorization. Modern compilers perform vectorization using hand-crafted algorithms, which typically only find local solutions under linear performance models. I present goSLP, which uses integer linear programming to find a globally optimal instruction packing strategy to achieve superior vectorization performance. Next, I will discuss how to modernize the construction of compiler optimizations by automatically learning the optimization algorithm. I present Vemal, the first end-to-end learned vectorizer which eliminates the need for hand-writing an algorithm. The key is to formulate the optimization problem as a sequential decision-making process in which all steps guarantee correctness of the resultant generated code. Not only does Vemal reduce the need for expert design and heuristics, but also it outperforms hand-crafted algorithms, reducing developer effort while increasing performance. Finally, I will show how we can use data to learn better non-linear performance models, rather than the complex and incorrect hand-crafted models designed by experts, to enhance the decision procedure used in Vemal. I present Ithemal, the first learned cost model for predicting throughput of x86 code. Ithemal more than halves the error-rate of complex analytical models such as Intel’s IACA. Both Vemal and Ithemal achieve state-of-the-art results and pave the way towards developing more automated and modern compiler optimizations with minimal human burden. Bio Charith Mendis is a final year PhD candidate in Computer Science and Artificial Intelligence Laboratory at Massachusetts Institute of Technology. He will be starting as an Assistant Professor of Computer Science at the University of Illinois Urbana-Champaign (UIUC) in Fall 2021. His research interests include Compilers, Machine Learning and Program Analysis. He completed his Master’s degree at MIT for which he received the William A. Martin Thesis Prize and his bachelor’s degree at University of Moratuwa, Sri Lanka for which he received the institute Gold Medal. Charith was the recipient of the best student paper award at IEEE Big Data conference and the best paper award at ML for Systems workshop at ISCA. He has published work at both top programming language venues such as PLDI and OOPSLA as well as at top machine learning venues such as ICML and NeurIPS. Charith’s recent work on performance prediction is used at Google as part of their CPU modeling effort.
Speaker: Jinliang Wei, Google October 13th, 2022 🤍 Description Launched in 2022 by MBZUAI, The AI Quorum is a winter series of gatherings designed to stimulate cutting-edge AI research with leading scientists and share an understanding of the discipline as a force for good. The series is strategic and high-level by design. The AI Quorum is focused on curiosity, collaboration, authenticity, and the pursuit of excellence – it is a coming together of the brightest minds to set the research agenda and to imagine both what AI could accomplish and how it might get there. Theme: Building Ecosystems for AI at Scale AI at scale - made famous by OpenAI's "GPT-3" in 2020 - is powering AI use cases that were previously thought impossible, such as synthetic art, automatic software programming, and drug design. Our workshop brings experts in AI at scale to the region, who will map out steps that organizations should take to become self-sufficient at these applications, and to explore new business use cases. The program has two sessions: Research Session - Thursday Oct 13th Industry Session - Friday Oct 14th, available to the public via livestream About CASL The Composable, Automatic and Scalable Learning (CASL) organization brings together technologists who envision that the broader public — the “power of many” — can collaboratively build Artificial Intelligence as easily as stacking “Lego” blocks. We explore how AI can be built in a holistic, “all levels” approach that covers data, models, training/inference algorithms, meta-learning, parallelization, scheduling, resource management and software infrastructure. About MBZUAI Mohamed bin Zayed University of Artificial Intelligence (MBZUAI) is a graduate research university dedicated to advancing AI as a global force for humanity. Established MBZUAI to educate and develop top talent, foster an innovation ecosystem, and act as a strategic think tank for the public and private sectors, the university has a vital role to play in many of the UAE Government’s strategic objectives, with AI identified as a critical component for future growth and prosperity. The university attracts world-class thinkers and doers in computer vision, machine learning, natural language processing, and beyond.
PyTorch is a deep learning framework for used to build artificial intelligence software with Python. Learn how to build a basic neural network from scratch with PyTorch 2. #ai #python #100SecondsOfCode 💬 Chat with Me on Discord 🤍 🔗 Resources PyTorch Docs 🤍 Tensorflow in 100 Seconds Python in 100 Seconds 🤍 🔥 Get More Content - Upgrade to PRO Upgrade at 🤍 Use code YT25 for 25% off PRO access 🎨 My Editor Settings - Atom One Dark - vscode-icons - Fira Code Font 🔖 Topics Covered - What is PyTorch? - PyTorch vs Tensorflow - Build a basic neural network with PyTorch - PyTorch 2 basics tutorial - What is a tensor? - Which AI products use PyTorch?
LLVM is an "bytecode" like language and ecosystem, which allows for compilation of high level languages into a common representation, which can then be further compiled to native executables of the target architecture. In this talk, we will go over the basics of what LLVM is, how it works at a deeper level, and how MethodScript is planning on using it going forward, along with plenty of code examples. Check out more of our featured speakers and talks at 🤍 🤍
DL Compiler Study #1 Paper: TVM: An Automated End-to-End Optimizing Compiler for Deep Learning, OSDI 18 Presenter: Constant Park (sonicstage12🤍naver.com) Presentation material: 🤍
Monthly AI Tech talk in San Francisco 🤍
tinyML Research Symposium 2021 🤍 Compiler Toolchains for Deep Learning Workloads on Embedded Platforms Max SPONNER As the usage of deep learning becomes increasingly popular in mobile and embedded solutions, it is necessary to convert the framework-specific network representations into executable code for these embedded platforms. This paper starts with a survey and benchmark of the available open source deep learning compiler toolchains, which focuses on the capabilities and performance of the toolchains in regard to targeting embedded microcontrollers that are combined with a dedicated accelerator in a heterogeneous fashion. The second part focuses on the implementation and evaluation of a compilation flow that targets such a solution and reuses one of the existing toolchains to demonstrate the necessary steps for hardware developers to build a software flow for their product.
XiaFei Qiu: Ali Cloud Computing Co., LTD, Senior technical Specialist, Ali Cloud PAI team is responsible for the internal and external AI infrastructure of Ali Group, and model system optimization has always been one of the team’s key technical directions. Compiler as an important means of system optimization, through internal precipitation years of grinding, is now in making open source ([🤍 (🤍 Other team shares include: -Pai-blade TensorRT Easier and More reliable to use and More Robust TensorRT via Pai-blade: [🤍 pring22-s41395/) Ali Cloud universal transparent performance solution based on AI compiler: [🤍 pring22-s41073/) AStitch: Enabling A New Multi-Dimensional Optimization Space for Memory-Intensive ML Training and Inference on Modern SIMT Architectures", Zhen Zheng, Xuanda Yang, Pengzhan Zhao, Guoping Long, Kai Zhu, Feiwen Zhu, Wenyi Zhao, Xiaoyong Liu, Jun Yang, Jidong Zhai, Shuaiwen Leon Song, and Wei Lin. ASPLOS 2022 Model optimization (compilation technology) introduction and best practice sharing of Aurora advertising system: 🤍 With the continuous development of deep learning, the model structure is constantly evolving, and the underlying computing hardware technology is emerging in an endless stream. For the majority of developers, it is not only important to consider how to effectively use computing power in complex and changing scenarios, but also to deal with the continuous iteration of computing frameworks. Deep compilers are the technical direction that the industry has explored in practice to solve the above problems. They allow developers to focus only on the upper model development, reduce the labor cost of manual optimization, and further squeeze the hardware performance. A number of deep learning compilers, represented by TensorFlow XLA and Apache TVM, have emerged in the industry. But with the development of deep learning, more and more models show the characteristics of dynamic Tensor Shape. For example, 1) CNN models that can support input of pictures of various sizes; 2) The NLP model represented by Bert is dynamic in both BatchSize and SequenceLength dimensions. However, the compiler was originally designed for static Shape scenarios, which requires input and the Tensor has fixed dimensions in each dimension, so it doesn’t work well for dynamic Shape scenarios. Faced with the strong demand in the business, the deep learning community has also appeared the work of TVM Relay VM. This talk mainly introduces aliyun PAI team’s BladeDISC centered work on dynamic Shape compiler, including the following contents: BladeDISC’s main architecture: why and how to build a dynamic Shape-based compiler based on the MLIR framework, why BladeDISC chose MHLO as the access tier Dialect, and what modifications BladeDISC is making for MHLO Dialect. What are the differences in the technical approach compared to Apache TVM and the design considerations behind it? Challenges brought by dynamic Shape: many deterministic problems under static Shape semantics, such as vectorization of instruction layer, codeGen template selection, whether implicit Broadcast is needed, etc., will face greater complexity in dynamic Shape scenarios. Some of the compile-time decisions need to be moved to runtime. Large-granularity operator fusion: how to realize larger scale operator fusion through hardware (Shared memory in GPU, Memroy Cache in CPU) with low access cost under the semantics of dynamic Shape. And how do you push the Tensor constraints on shapes even if you don’t know the shapes? Computation-intensive operators: how to improve the efficiency of computation-intensive operators by using vendor library, Apache TVM operator generation, handwriting operator and other means, and how to realize the unity and complementarity of non-compilation-generated operators with compiler in architecture. How to support a variety of deep learning frameworks, lower the threshold of use for end users, and the application of BladeDISC in Ali Cloud business.