Fireside Chat w/ Clement Delangue

Bookmark
Summary
Questions Covered
Why It Matters
x

Hello, thanks for everybody for coming out tonight. It's a packed house, so I think we had something like a thousand people who wanted to attend, and so I think you know people are both very excited to see Clem and then I think there's ever growing enthusiasm for AI, so thanks so much for making it, and I'd also like to quickly thank Edwin Lee, ali pavilan, Emily, the stripe AV, event security, food and Catering team. Show more

Show less

Thank you so much for putting on this event tonight and hosting everybody. We're going to be talking about Clem's background in Origins, and so I'll keep the intro really brief, which is, you know, Clem has is the CEO and co-founder of hugging face, which is really one of the main pieces of infrastructure that everybody uses in the AI industry. He's been working on AI for about 15 years now, of Brazilian from France, has been in the US for about 10 years, and so welcome, and thank you so much for joining us today. Thanks for having me excited to be able to chat. Yeah, okay, and so could you just tell us a lot about the origins of hugging face and how you started working on it, what it was originally, how it morphed into what it is today and how you got started? Yeah, absolutely, as you said, I've been working on on AI for quite quite a while before it was as sexy, as hot, as popular as its mainstream as as today, and I think that's what gathered our co-founders, with three co-founders for, for working face around this idea that it's becoming kind of like a new paradigm to build technology, and we were really excited about it when we started the company. Show more

Show less

We wanted to work on something that was scientifically challenging- because that's the background of one of our co-founders, Thomas- but at the same time, something fun, and so we actually started by building an AI tamaguchi, something like a chat GPT, but really focused on fun and entertainment. At the time, you know, there was Siri, Alexa, but we thought it was pretty, pretty boring to focus only on like productivity answers, and we actually did that for almost three years. We raised the first pre-seed seeds on this idea. Some users really liked it. Actually, they changed a couple of like billion messages with with it with kind of like organically, and I can tell the story later. We we pivoted from that to what we are right now, which is the most used open platform for AI. Well, can't you interested in AI to begin with? I mean, you started 15 years ago working in the area and I feel like AI has gone through different waves of popularity. Right, we had Alex net sparked a lot of interest. There's a CNN and RNN world. Did you start even before that, or when did you first get interested? Yes, at the time we weren't even calling it, you know, AI or or machine learning. The first startup I worked for was a company called mood stocks and we were doing machine learning for computer vision on device. Show more

Show less

So we were building a technology to help you point to your phone and add an object and recognize it, and even at the time it was kind of like mind-blowing what you were able to do with it. I remember. I think for me the realization of how AI could really unlock new capabilities is when I met the founders of this startup. I was working at eBay at the time- and they they told me: oh, you acquired this company called red laser- red red laser that is doing barcode recognition for you to, you know, recognize objects and and then, kind of like, put up the eBay page. It told me: you know, you guys suck. You should use machine learning and instead of recognizing the barcode, you can actually recognize the objects itself. I was like you're crazy, it's impossible, you can't. You can't do that with kind of like traditional software. You can't do that with with code. There are too many objects, it's possibilities are just just too broad to do that. And they were actually managing to do that with with some for some form of machine learning at the time. So that's when I realized, wow, you can do so many new things with this new technology. Apple really led me to to where I am today. That's cool. So. So you then started hugging face. You're gonna do like a AI Tamagotchi, and I think it's funny how you used to say Ai and people would snare at you and they'd be like: no, no, it's machine learning, right. Show more

Show less

And so I feel like the lingo has shifted back to AI again, given what some of these systems can do. And then, what made you decide to move in a very different direction of what hugging faces? Yes, it was very, very organic- with one of these founding, founding moments. It's. It's a good, good thing that we had strive- because I think it's Pat kodisan who talked first about the importance of, you know, not just founding a company but having founding moments that change the trajectory of of your company, and for us that happened thanks to Thomas Wolfe, one of our co-founders, who I think it was like a Friday, Friday night, it was like I've seen this thing, this thing called birds, that was released by Google but it kind of sucks because it's on tensorflow. I think I'm gonna spend the weekend porting that into pie torch. And we're like, yeah, you do you, you know, have fun, have fun doing your weekends. And on on Monday he came back and it's like, okay, I'm gonna release it. And he released it on GitHub, tweeted about it, and we got like a thousand likes, which for us at the time we were like nobody's, like French, French nobodies were like: what's happening there? Why? Why are people like liking, liking this very, you know, specific, very Niche, very kind of like technical tweet about Pi torch Port of bird. I'm like, oh, there's something there. So we kept kind of like exploring that we- we, you know, joined him, started to add other other models to to the GitHub repository. Show more

Show less

And the community came, came together, people started to fix bugs for for us in the repository. We're like, why, why are people doing that? They started adding models, right, they started, for example, the first GPT. They added the next models that were released and and really fast, we ended up with like one of the most popular GitHub repository for for AI and that's kind of like what transitioned us from from this first first idea to to where we are now. Okay, and could you describe for people, who I'm sure most people know? But could you describe for people what hugging faced us today and how it's used and the importance of of the product and the platform and the ecosystem? Yes, now we lucky to be the most used open platform for AI. You can think of it as, as mentioned before, some sort of a GitHub for for AI. So, the same way. Github is this platform where companies host code, collaborate on code, share Code, test code. We're the same way, but for machine learning artifacts. So there's been more than a million repositories that have been hosted on the hugging face platform with models, most of them open source. So maybe you've heard of stable diffusion: T5, Bert, originally obviously Bloom. For example, whisper for for for audio data sets. There's over 20 000 open data sets that you can use on the platform- and demos- over 100 000 demos are hosted on the platform and more than 15 000 companies are using the platform to bring AI into their features into their products or into their workflows. Show more

Show less

Yeah, some of the the most popular questions on Dory or you know, through the air table form that people asked were around the future directions, because, given the centrality of where hugging face is, there's so many directions that could go, and everything from, like you know, bespoke B2B hosting to tooling to, you know, other types of products or activities. What are some of the major directions that you folks are pursuing currently? In terms of product, I would say there are two, two main directions that we're following right now. One is like we're seeing that AI is turning from Camp, like this Niche techniques solving some problems, to the default Paradigm to build All Tech, and for us that means going from text that is really used on the platform- right now that is also really really used- and text to image, to expand to every single domain, right. So, for example, last week we've started to see the first open source text to video models, right. We started to starting to see in the, in the on the platform, a lot of Time series models, right, it's like to do Financial prediction, to do like your ETA when you order your Uber. We're also starting to see more and more biology, chemistry models, so kind of like. Making sure that we support this. Broadening use cases for AI is one, and the second one is making it easier for everyone to build AI, including software Engineers. Historically, our platform has been more like designed for machine learning engineers and and people who are really kind of like training models or optimizing models, assessing models. Show more

Show less

What we're seeing now, especially with the AI apis, is that everyone wants to do AI right, even complex software Engineers, product managers, infrastructure Engineers. So a big focus of ours and some of the things that we've released in the past few weeks and now we'll keep releasing, it's kind of like reducing the barrier. Two, entry to using our platform because ultimately, we think every single company or every single team should be able to use open source to train their own models. Right, everyone is is talking today about- you know, chat GPT, about gpt4, but I think in a few months or in a few years, every single company is going to build their own gpt4 and they're gonna train their own gpd4 the same way. Today, if you think of it, every company has their own code repository, right, and there's as many code repositories as as companies. We think tomorrow, every single company is going to have their own models, their own machine learning capabilities, not really Outsource it to some someone else, but really have these capabilities that will allow them to differentiate themselves, to cater to their specific audience or their specific use cases. You know, it's interesting because when you talk about the future, one thing that I'm really stricken by is, you know, if I look back over the the course of my career, there have been multiple or a small number of very large Paradigm shifts or platform shifts, right? So there was the internet, which was obviously a huge transition in terms of bringing everybody online, you know. Then, a few years later, we ended up with mobile and Cloud, so suddenly you could host anything anywhere and simultaneously people could access any product from anywhere in the world. Crypto, I feel, is almost like a side branch that went down the financial services route but didn't become a true platform, at least not yet in terms of compute. And then now we have ai and it feels like with each platform shift, you have three or four things that change. Right, the input and output of how you program a system shifts in some ways, or at least the types of data you deal with. Show more

Show less

User accessibility and UI shifts, right, how do you actually interface with something, for mobile was different from the desktop, and then the size and magnitude of the implications of that shift are massive, right? And so if we view AI as a new platform, how do you view or how do you mentioned? Everybody will have their own form of gpt4. It seems like the nature of programming itself may change at some point and we can put aside the whole question around. Do we also create a digital species? And maybe we talk about that at the end, but how does hugging face aim to play a role in terms of this massive transition of platforms? Yeah, yeah, the way we see things, is we really like enrich karpati, analogy of like software? One point, know, right, which is the way in the methodology that we've been Building Technology with for the past 15 years, and now ai is software 2.0. Right, it's a new methodology, it's a new way of building, building all technology. It's a new pad. I'm the new default to build all technology. And if you think of that, you know you need, for this new paradigm, better tools, more adapted tools to do that, and you need better communities. You need ways for teams to collaborate and for the whole ecosystem to collaborate, and that's that's what we're kind of like trying to provide like a new tooling, a new collaborative platform to build AI better, and we also trying to build a future that we're excited about. I think a lot of people are kind of scared about AI right now and the potential and and the risks Associated to it and the way we think about things. If you can build a future where everyone is able to understand Ai and build AI, you remove a lot of these risks because you involve more people, so you reduce, for example, the probability of very biased systems. You give the tools for Regulators to actually put in place safeguards and you give companies capabilities to aligns the systems that they use and provide to their users in these customers with their values right, which is what you want ultimately. Show more

Show less

You know you want stripe to be able to say: you know, this is our values, so this is how we're building AI in alignment with these values. So that's also something important that we're trying trying to do. We say sometimes that our mission is to democratize good, good machine learning and we're working really, really hard on that because we think it's it's important for for the world. Yeah, it feels like hacking face has always been very consistent in terms of wanting to have ethical AI or ways to participate it, that are strong in alignment. I've a number of companies, like, for example, anthropic has this approach of like constitutional AI right, where they basically say we almost provide a constitution as we train the model for how. What should govern the activities or actions of the model that results? What are other approaches that you think work best and what do you hope that people are doing more of relative to alignment? Alignment is this kind of a complicated terms because it means different things to different people. It can be taken from like the heavy ethical standpoints in terms of like alignment between values and and systems. A lot of people use it today as more kind of like accuracy, Improvement to. To be honest, when they kind of like do some alignment work, they actually make the models more accurate thanks to reinforcement, learning with human feedback. So it's kind of like hard to to to debate around that, I think in. In general, in my opinion, you can't control, improve and align A system that you don't understand. Show more

Show less

So the main thing that we're trying to push our talking face is more transparency in terms of like how these systems are built, what data they're trained on, what are the limitations, what are the biases? And I think if you create more transparency around that, you kind of like almost create a system that is more physical at core, so that that's kind of like the biggest thing that we're focusing on. Well, what is your biggest concern in terms of how open source AI could be misused or abused? There's a lot of conflict, things that can be dangerous with with AI, however it's distributed right through apis or or open source. The biggest thing is, is dual use, right when you want to kind of like use it in a way that is not the right way that model Builders defines. And so one thing that we've been experimenting with, which is super, super early and probably not a solution to to everything, is creating new forms of licenses for models. So we've been we've been supporting something called Rail and open rail, which is responsible AI license, which is supposed to be an open license for everyone to be able to use the model, but that defines uses that are prevented from the model authors as a way to create kind of like legal challenges for people to use it the the wrong way. Does that? That's kind of like one, one approach that that we've taken to to try to mitigate some of the Dual use of of AI in general. I guess, as you look at the world of Open Source versus closed Source, one of the things that's really been happening is when, before many of the different industrial research Labs, the Googles and the opening eyes of the world would publish a model, that actually also publish the architecture of the model, they publish a paper that goes in depth in terms of how the thing works. You know, the original Transformer paper was reasonably explicit and now they're starting to curtail the amount of information that's coming out with each incremental model. Show more

Show less

Do you think that puts open source at a disadvantage or how do you think about the future, particularly on a large language model side? Because when I look at the image gen models, they tend to be reasonably inexpensive to train, they tend to be more open source heavy and it really seems to be more along the lines of the foundation models, where this could become an issue because of the ones that need massive scalability and compute. Are you concerned about the lack of publishing that's starting to happen and how do you think about the the Delta between open and closed Source models for big foundation models? Yeah, it's definitely a challenge. I think it's. It's good to remember that we got where we are today thanks to open science and open source, right. Everything, every system that that is around today is built, stands on on the shoulders of of giants, right? If there wasn't research papers for for birds, for for Transformers 445, for GPT, maybe we would be like 50 years away from where, where we are today. I think that's what created this massive positive loop that made the progress of AI, I think, faster than anything you've seen before. And if we stop doing that, it's gonna slow down, right, it's gonna take more time and we'll just kind of like move slower as as a field. But I think one thing that we're seeing is that, you know, life bores vacuI think that's that's. That's the proverb, right? So I think if some companies and some organizations just start to do Less open research or less open source, what we're saying is that other organizations will take over and actually reap the benefit of it. So, for example, we're seeing a lot of Collective, decentralized Collective. Show more

Show less

There's like a Luther AI that announced the non-profit a few few weeks ago. You have organizations like Allen AI in in Seattle. You have organizations like stability AI, Runway ml. You have Academia that is coming back in the picture right. The original stability Fusion was built in a German University, in a group, a research group called conviss, using Stanford, doing more and more in in open source and for open research. So I think ultimately that's that's what we're gonna see, why we're gonna see like different sets of organizations taking taking over and kind of like contributing to open research and open source, because at the end of the day, it's not gonna go anywhere, right? I mean, if you look at traditional software, there's always open source and closed Source, right, and open science is not gonna go anywhere because the goal of most scientists is actually to contribute to the society and not just to do something to make the company money. So I think that's what's gonna happen, maybe like the types of companies that are doing open research and open source are going to evolve, but I'm not too scared about it. One kind of like proof of that is that the number of models and open source models, a number of Open Source data sets, number of like open demos on hooking face, has been actually accelerating for for the past past few months. And you're right in pointing out that we're a little bit biased on on text, right, that's one area where proprietary is ahead of Open Source, right, larger language models. But if you like a look at audio, you know kind of like the best things or like whisper, for example, thanks to open AI, that that is open source. If you look at text to image, stable diffusion is is huge and probably like bigger than any proprietary system. If you look at biology, chemistry, time series, also open source is very powerful. So I think it's always some sort of a cycle right. Sometimes proprietary gets ahead thanks to some, some companies that are doing it like a really good job- like open AI is doing an amazing job, for example, right now- but sometimes open source catches up. Show more

Show less

Sometimes it's going to be your ad, sometimes it's going to be a little bit later. That's kind of like the a little bit kind of like a normal technology cycle. I would say, yeah, I think that's true. If you look at technology Cycles, it looks like often the really successful large open source approaches that are offsetting commercial efforts tend to actually have a large commercial backer who wants to offset the activities of others. It's almost like strategic counter positioning. So, for example, in the 90s the biggest sponsor of Linux was IBM because they were trying to counter Microsoft. And then if you look at a variety of- you know open source, you know mobile browsers, webkit- you know it's backed by either Apple or Google, depending on the the branch. Who do you think or do you think somebody will emerge in terms of one of being becoming one of the major sponsors of Open Source, like is? Does Amazon do it to offset Google and the relationship between Microsoft and open AI in the cloud? Is NVIDIA, as at Oracle? Is it a conglomeration of multiple parties or do you think a government or somebody else may intervene in this case? Yeah, I think there are a lot of big tech companies that have like kind of like good alignments with like open science and and open source. You- you mentioned some of them- like Amazon has been a really really good backer of of Open Source. Nvidia has been a very good support. Microsoft has been supporting open source a lot to. So, yeah, I think some of some of it is going to come from from there. I'm also excited about more governments involvement in kind of like democratizing access to compute, which has been kind of like one challenge for large language models. So when, when we trained with the big science group a model called Bloom, which at the time when we released it was the largest language models, like language model that was open sourced we got support for from a French supercomputer, gold Jones a. So I'm excited to see that more because I think if you look at how you know public policy and and kind of like governments can have a positive impact. Show more

Show less

I think providing compute to universities or like independent organizations, non-profits for, in order to avoid concentration of of power and create more transparency, is a very obvious way where they can have have an impact and a positive impact on society. So I'm also excited about that. About disability for public organizations to support more open source and open research in AI. Yeah, it makes a lot of sense, I guess. If you look at the types of Open Source, there's going to be models of various sizes. And, to your point on the large language models, if you assume you know the rumor, the, the rumor of the public estimates are, you know GPT 3 took 10 million dollars at the time, although I guess now would be seven million dollars to train, and then GPT 4sa was 50 to 100 if you were to do from scratch, and then maybe GT gpt5 is 200 million dollars and GPT 6 is half a billion or whatever it is, you keep scaling up cost, and so you need these sort of large sponsors to at least be at The Cutting Edge Of All Times. But then one model behind may be dramatically cheaper, and so it's interesting to ask how that world evolves relative to government intervention or corporate intervention or other things in terms of sponsoring these models. Show more

Show less

We've de caveats that we've seen that some scaling is is good. We don't really know if that's the scaling that helps the current emerging Behavior to, to be honest, and that that's one of the challenge of the lack of transparency that's that's happening right now. It's actually a really interesting question: what do you think are the basis for the emergent behavior, and what do you think is the biggest driver for scale going forward? Is it compute? Is it data? Is it algorithms? Is it something else I think we're starting to to realize and have like a better consensus is in in the science community that data, and not only the quantity of data but the quality of data, is starting to matter more than just blindly scaling the compute. But I think also something that is important to remember is that training a very good large model today is still very much an Arts, and it's not just a simple recipe of saying, like you have good data, you have a lot of compute, you're gonna get a good model. It's very much still like a very difficult, very hard to understand technical Endeavor. It's almost like Alchemy that a very, very small number of people really managed to do today. Right, and maybe it's like 20 people in the world today, maybe it's 50 people in the world today. Show more

Show less

It's it's a very, very small number, I think. I think people sometimes don't don't realize that, and so I think there's a lot of progress also to be made on understanding the techniques to get to a good model, almost independently of compute and data. Why do you think it's such a small number of people? It's it's a it's a billion dollar question, right, if, if it was easy to to know, I think everyone would would be doing it. I think I think, interestingly, it's a mix of technical skills, science skills and kind of like almost projects management skills which are kind, kind of like unique. That's yeah, like it's not just a matter of, like you know, doing the right training, but it's kind of like knowing how much more training you want to do. It's it's a matter of kind of like understanding when you want to release things, when you want to keep doing kind of like optimizations before launching your training run, when you wanna kind of like start the big six months, three months training run, or where where you should kind of like keep experimenting. So yeah, it's a mix. It's a mix of all of that which makes it super hard but super fun at the same time. Right, if it was was too easy, it wouldn't, wouldn't be fun, but hopefully it gets. It gets easier and he gets more democratized so that everyone can kind of like take advantage of that with the benefits of that, learn from that and then, as as we said before, it builds like better systems for for each organization. Show more

Show less

Where do you think are the most exciting areas of AI research right now, or where do you wish more people were working? I'm super excited about, you know. I mean, it's it's fun to do like texts right, and I'm just here for a short period of time, so I went to a couple of like Atkinson and there are some some really cool stuff, but I think I think it's it's interesting and important to work on, you know, more technically challenging problems right now, especially in other domains like I'm super excited about biology. How do you apply AI to, to biology? How do you apply AI to to chemistry? To, kind of like- both kind of like- have positive impacts in the web also, to differentiate yourself and, kind of like, build a more technically challenging stack for for AI. So these are some of the things I'm excited about right now. Show more

Show less

And then, how do you think about? I feel like there's two views of the world, and maybe neither is fully correct in terms of general purpose models versus Niche models. Right? So some people are making the argument, which is: you just keep scaling up models, you get, you make them more and more General and eventually they can do anything. And then, on the other side of it, people are saying, well, just do the focus small model that is targeted to the specific thing that you're trying to do with the data set that you're trying to do. It can be highly performant and you don't need to wait for the big generalization. Where do you think we'll be in three or four years? Show more

Show less

Yeah, that's a good. That's a good question. I've tried to stop doing predictions in AI because it's it's too hard these days. Like, hey, I say something two months, three months, three months later it goes completely the other way around and I look like a fool so I won't. I won't do too many predictions, but I I usually try to more like: look at the the past and data points. Since chat GPD got released, companies have uploaded to hugging face over 100 000 models, right, and I don't think companies like train models for for fun, right, if they can use something else, if they don't need the training they they would. And an interesting- other interesting data point is that if you look at all the models on the hooking face hub, the most used ones are actually models from 500 million to 5 billion parameters. And I think the reason why is that when you get kind of like more customized, specialized models, you get something that is like first, like simpler to understand and iterate on. You get something that is faster most of the time, which sometimes can run like on on device, on your phone or like on specific Hardware, something that is cheaper, cheaper to run and actually gets you better accuracy for your specific use case. When, when you specialize it, sometimes for some applications, when you're doing a chatbot for customer support where customers are asking for you know your last invoice, you probably don't need a chatbot to be able to tell you about the meaning of life and the weather in San Francisco. You know you just needed to be really good at your specific use case and what we're seeing is that having a more specialized, customized, smaller model for- for that usually is- is a better fit. But there are some use cases, like if you're being, for example, and you want to do like a general search engine to to be able to answer all these questions, obviously like a large, more General model makes make sense. Ultimately, I mean, I think there's gonna always be all sorts of of different models. Show more

Show less

The same way there are all sorts of code based, right, like you wouldn't like today, you don't really say like, oh, my code base is better than yours. You don't say like stripe code base is better than Facebook code base, right, it's true? [laughter]. They just do different things, right. They answer different problems to different questions. The same four models. You know like there's no one model that is better than other. It's more like what model makes sense for your, for your use case, and how, how can you kind of like optimize it for your specific use case? The last other questions I wanted to ask before we open things up to the audience is around business models and business opportunities and Ali, the confinancy of databricks, has this really good framework for open source where he says: with open source, you first start out with some open source software and just making that work is like hitting a grand slam in baseball and then you put down the baseball bat and you pick up a golf club and you hit a hole in one to have a successful business. So it's almost like you need two miracles in order to build something amazing in open source- sustainable as a company as well as a product. How do you think about monetization of hugging face and what are some of the directions that you all are going for and that for that? Show more

Show less

Yeah, I mean I don't know if I I totally agree with this technology, because I think open source also gives you, like, super powers and and things that you couldn't do without it. I know that for us, you know, like I said, we're like kind of like random French Founders and if it wasn't for for the community, for the contributors, for for the people helping us on the open source, people sharing their models, I mean we wouldn't be where we are today. Right, so it also creates new, new capabilities, not only on the challenges for us. Show more

Show less

The way we've approached it is that you know, when you have kind of like an open platform, like like cooking face, the way to monetize is always some sort of freemimodel or some kind of like version of of a Freeman model. So we have 15 000 companies using us right now and we have 3 000 companies paying us to to use, to use some of our services and usually they pay for additional features like Enterprise features, right. Some companies they want security, they want user management or they pay for it for compute, like they wanted to run on faster Hardware, they want, you know, to run the inference on the platform, they want to run the training on on the platform. And like that we found kind of like a good balance where if your company actually contributing to the community, into the ecosystem, you're releasing your models in open source, it's always going to be free for for you. Show more

Show less

And if you accompany more like you know taking advantage of of the platform, then you contribute in a different way. You contribute financially right by by helping us monetize and and keep work, working on this. So we're still still early, early on that, but we've kind of like found this, this kind of like differentiation between the two, that allows us to keep working for the community, keep doing open source, keep contributing in alignment with with our values and what we want to do, but, at the same time, make it like a, a good business, a sustainable business that allows us to to scale and grow our impact. Yeah, you mentioned the community a few times and I think hugging face is one of the most beloved, like products and communities, in the AI world. What were there specific tactics you took to build out that Community, or things that you felt were especially important in the early days? Or how did you evolve something that's so powerful from a community basis perspective? I would say just the Emoji? You know, having having the hugging face, hugging face emoji as as a, as a logo, as a name- that's that's holy- took all it took to get, get the love of the community. No, it's it's hard to say, hard to say. We're really grateful. Some of the things that we've done, that that we've been happy with, is that we never hired any Community manager and and we actually- it's a bit counter-intuitive, but it led to actually every single team members, every single hugging face team members, to actually share this responsibility of contributing to the community, talking to the community, answering to the community, instead of, like, having a couple of like team members and, you know, instead of having researchers being like, Oh, I'm not gonna. Show more

Show less

You know, do the community work because we have this community manager. So, for example, our Twitter account, the hugging face Twitter accounts- everyone in the company can tweet from it. So if you're seeing the tweets from the hugging face Twitter accounts, it's not for me, it's not from, like, a community manager, it's from any of the hugging face team members, which was a kind of like a bit scary, scary at the beginning, especially especially as we grow. We haven't had any problem yet, but I apologize in advance if at some point, you see like a rock Rogue tweets that might, might be a, maybe a team member, but yes, it's a smart approach to always be able to blame someone else for exactly yes, maybe it's gonna be me, actually it's gonna be doing the bad, the bad tweets that'll be able to say it's an intern or something like that. Yes, yes, you must have a lot of interns, just in case you're kind of: yes, that's smart. Yeah, and then last question for me, and then we'll open up to the audience: what do you wish more startup Founders were working on or where do you think there are interesting opportunities for people to build right now? So I'm a bit biased on that, but I I wish more startup Founders were actually building AI. You know, not just using AIC stems, because I think there's a big difference the way, the way I see things. In the early days of software, you could, you know, use an AP. Show more

Show less

You could use an API, you could use like a weeks or Squarespace or WordPress to build a website, right, and that that's good. That's kind of like a good way to get something up quickly and and and you can do beautiful things. But I think the real power came from people actually writing code right and building building technology themselves, and that's that's how you get kind of like the power out of this thing. It's kind of like the same for AI: right, you can do something quickly. I think, ultimately, if you really want to be serious about AI, you need to kind of like understand how models work, how they're trained, how you can optimize them, and that's also what's going to unlock the most potential for like truly grades, startups and great products and and and companies that are differentiated from increments, just like adding some like ai ai features and we're seeing a lot of these companies. Like you know, Runway ml announced the the release of their text video, I think today or yesterday. That's a good example of like a really kind of like AI native startup that is really actually training models, building models, really kind of like doing and building AI, not just using using AI. So that's one thing that I usually recommend startups do, or or, if you're just using AI, just you know, build your company accordingly, knowing that your, you know mode or your like Advantage, especially the early stage, won't be so much on the technical capabilities but more on you know, getting customers or getting users or any you. You wrote a beautiful, very good like article that I would recommend, if you want to, to read, on modes for for AI. They need to then conflict, take advantage of other kind of modes than than, in my opinion, like technical modes. Okay, great, let's open it up to the audience. If there are any questions, maybe we start in the corner right there. Hi, you mentioned something about like open source thinking. It's good because, like you give it to good actor where and also like bad actor can use, but like good actors probably are. Show more

Show less

We have more good actors, but how to usually respond to claims like open AI, that they don't say any details about their models, they don't open source anything because they're afraid of AI safety. I mean, I I respect everyone's approaches, right, like different organizations have different ways of like seeing, seeing the future. Or like the current current way technology technology is is building, I have a bit of a different view. Right for me, if you look at development of of Technology, usually the biggest risks come from concentration of power and and the fact that some technologies are built like behind closed door and if you build things like in the open, you actually create a much more sustainable paths in the long run for this technology to be embedded in society in general. Show more

Show less

Right for like Regulators to be able to create the regulatory framework for for these Technologies, for ngos, Forex, Civil Society to be able to weigh in. So, yeah, I think we we're starting from very different position philosophically speaking, but you know that's that's not a too much of a problem in my opinion, for for the ecosystem. You know you can have different organizations with different, different points of views and kind of like the most important thing is just that your company doing or aligned with your, with your company values. Show more

Show less

Okay, any other questions right there in the middle Paradox of this Paradox of you know, how do they preserve the privacy of their data while improving the models is, you know, in housing open source models. Here the solution or approaches like Federated learning. You know, appreciate your thoughts. We've, we've been working a little bit on kind of like more like distributed or decentralized training, which is still hard to do and I think nobody has really figured it out yet. But that's that's when you asked me about the my interest in in the science, science progress. That's one area where I'm really excited about to see more more people working. Show more

Show less

But yeah, the more like practical answers and solutions today are on device models or there are also some more solutions that are embedded in how you train models. So, for example, we're leading an initiative called B code which is releasing some. Show more

Show less

I think it released a few few weeks ago the biggest open repository of code that people can train code models on. It's called to stack and the interesting thing about it is that it's we gave the ability to opt out from these data sets before training the model. I think you've seen last week the training of the Adobe model that also have been really good at training on good data where users have actually opted in for for the training. Show more

Show less

So these are also some, I think, important developments in the fields where you want to be a bit more intentional about the data or more transparent about it. One of the challenges is that a lot of these systems today we don't really know what they've been trained on, right, because there's no transparency about it. I wish there was, so that we can kind of like have a better understanding of, like what you're capable of doing with which data and and then kind of like find solutions to make sure it stays like privacy preserving for for people. Show more

Show less

But there's some, some good development and, and I think we'll we're making a lot of progress there. Do you think we're going to end up with just like this robotstxt right now for search? Do you think we'll have like AI dot text or something for you know sites to be able to opt out of use in AI data sets? Yeah, yeah, probably we'll need to have Norms around that for sure, around like consent for for AI. I think that's that's really important. That's really important, for example, for artists, for digital artists or or non-non-digital artists. That's important for for attribution and distribution of value? Right, because we, we want people to who are contributing to be able to be rewarded for it. An interesting question that I don't think has a good solution right now. Show more

Show less

But in a world where search is only kind of like a chat interface, what rewards, kind of like the underlying creators of the of the content? Right, if I, if I build a websites- and before I was living because you know I was getting traffic on this website so I can do ads- and if now the results of this website is actually shown on like a chat answer, without mentioning me as a content creator, you know what's, what's my incentive to create this content, right? So will people just stop building websites because they basically don't get the attribution or or or the reward for it? These are very, very important questions. I think we're just scratching the surface of what needs to be done for things like that to be resolved, but very important questions. Maybe back there towards the top, very expensive. So what are the modes that can be built on top of them? There's, there's data, there's human feedback. There's data, there's human feedback. There's maybe, to some degree, being good at prompting, and are there any other methods that you feel startups can build modes? That's a good question. That depends a lot on you know, your, your skills, your background as a team, what, what you're excited about. I think there's a lot, a lot to be built around: specializing for you know a domain, a specific domain, a specific use case, a specific industry, specific Hardware, right, that's that's what I'm, like, most, most excited about, right of like, trying to leverage some like specific expertise or specific domain, specific kind of like problem that others, bigger players, are not going to be focusing on. Show more

Show less

As I said, like, for example, biology, chemistry, time series, all these domains where you don't see as much activity, I think is a good way to have, in a way, more, more time as a startup to build your differentiation and your, your Tech stack to a point where you're not at the mercy of, like hundred dollars or startups like releasing exactly the same thing as you did and kind of like losing, losing your Edge, so that that would be some of my recommendations. Show more

Show less

But again, I mean, I've told about the story of face right, how we started with Ai tamaguchi and anti-dope, where we are. So you know, one of the main things is just to you know, start working, start building, listening to what you're seeing as as signals iterate on on things and and I'm sure you land on on something that you're excited about at some point- okay, great, I think. Unfortunately, we're out of time, I think, for the next 55 minutes. Feel free to hang out here. Thanks to stripe for being such a gracious host, so this is also opportunity for you to meet other folks who are very excited and and working in AI. Thank you again as well to Clem- is awesome. Thank you, I really appreciate making it tonight foreign. Show more

Show less

Show more

Show less
Do you find this recap helpful? 👍 👎
Why?
Thank you for your feedback 😊