Exploring the relationship between artificial intelligence and data following the virality of ChatGPT
Originally posted here.
Exploring the relationship between artificial intelligence and data following the virality of ChatGPT was the main topic of the PermaDAO’s weekly Thursday space. Here is the transcript, so let’s dig in! Coral: Thank you all for coming to PermaDAO’s weekly Thursday space at 23:30 Beijing time. This week we have two guests from the U.S., […]
Exploring the relationship between artificial intelligence and data following the virality of ChatGPT was the main topic of the PermaDAO’s weekly Thursday space. Here is the transcript, so let’s dig in!
Coral: Thank you all for coming to PermaDAO’s weekly Thursday space at 23:30 Beijing time. This week we have two guests from the U.S., so we chose to start today’s topic at 23:30 to avoid the effects of jet lag to the greatest extent possible.
I think everyone must have noticed ChatGPT, the latest AI software hit in Silicon Valley, which is the reason why we chose to talk about this space topic, and I think everyone who came to the space wanted to know how much this AI can subvert our imagination. Without further ado, let’s invite our guests to introduce themselves.
QiZhou: Hello, everyone. I’m glad to be here today.
I’m QiZhou, and I’m very happy to discuss the relationship between AI and data with you today. Our current project is called EthStorage, which is a data network based on Ethereum smart contracts. It uses Ethereum’s technologies, including DA and zero knowledge proof. We hope that the security of Ethereum storage can improve the performance of Ethereum storage through some technical methods of layer2. I’m very happy to discuss the possible cooperation of blockchain data, data included in the whole Web3 and other technologies, including artificial intelligence in the future.
Coral: Dr. Zhou and Mai are our guests from the U.S., so we’re glad you’re here at 7:30 a.m. to discuss this with us.
Mai: I’m MaiMai, I’m based in the US and I’m currently doing investment and research in a Crypto venture capital. I have a close relationship with Stanford and I’m very interested in technology companies, so I appreciate this opportunity to discuss with all of you today.
Erica: Thank you, Coral. Hello, I’m Erica and I’m a crypto analyst. In fact, we have recently noticed that AIGC has written a series of about four or five reports. Starting from the popularity of the underlying science of AIGC, we answered step by step and discussed some questions. Recently, we also tried to write a research report with the hottest ChatGPT, so we can share some of our thoughts in the experience process.
Red Army Uncle: Hello, everyone. I’m Red Army Uncle. My main concern was the Cosmos cross-chain ecosystem where I launched an IBCL community to have a discussion about it. Recently, I turned my focus to ChatGPT. After it became popular, I mainly discussed the user experience of the product in the community. Of course, I also had a certain understanding of the technical details and I specially built a telegraph community to communicate with my friends. It’s really exciting that our community has gathered nearly 1600 people, and then everyone uses the product in all aspects. There are so many different ways to use the product, which can help improve the value of using this tool more effectively. We constantly accumulate ideas throughout the process. Thank you again, and I hope to share with you the experience the community members and I had with ChatGPT.
LongMan: Hello, everyone. My name is LongMan and I am the co-founder of Daw. I previously worked at an AI company, and then joined the industry in 2017. Then I mainly worked in the field of Web3 data analysis. At first, we also worked in natural language and paid close attention to the future development of this natural language. One of the products we are working on is the Web3 data analysis engine, which is mainly about service operation. I’m glad to join the space and share my ideas with all of you today.
Coral： The first question today is actually quite simple because I believe that many audiences are attracted by our topic today mainly because they want to know what the new technology is. So, the first question is to ask the guests to explain to the audience what ChatGPT is and how did ChatGPT gain such huge attention?
QiZhou: Yes, although I have done some AI myself before, I’m actually not an expert in technology. But the whole event is worth learning from, especially from a Web3 perspective.
Personally, from the perspective of the viral, it is a very spontaneous behaviour in the community. There are a lot of communities out there and one day, it happened to have many people talking about it, including our own industry research community with Web3. They will discuss all kinds of their own questions and have some interesting answers.
Of course, it also includes some interesting essays. For example, there is one community member who asked ChatGPT to write an article about Zhou Qi replacing Vitalik to become the CEO of Ethereum. Of course, this is false news. But I can see that ChatGPT takes it seriously and try to understand the content according to our intentions, and then write an essay that I think is definitely way better than the articles that I will write. Then everyone thinks that it is impossible to write such an article, even if it is such obvious fake news, but ChatGPT’s embellishment of this article and the supplement of various details make me think that it can produce a very real writing effect.
Like the Turing test, it means that I can not only tell the Turing test that I am a person but also let the Turing test machine believe in fake things and execute them. This has made many people to be taken aback by the current level of AI. There are even some very interesting things. For example, having ChatGPT to write a piece of code where the piece of code generated can really run without a problem. Another example is to have ChatGPT writing code for a smart contract or searching for bugs in a smart contract are amazing functions as well. On the other hand, I included some traps in my personal tests and finally found some interesting responses from ChatGPT. Therefore, we will deeply understand what its use value is when we keep talking with it, so it has reached a million users in four to five days. Users of the same magnitude will take a long time for a Web3 product, I think both the product itself and its publicity are very worthy of our learning and reference.
Mai: ChatGPT is actually an AI chatbot, and what makes it different from all kinds of chatbots is that it has a very strong contextual connection capability, which means that you can have a conversation with him, and I believe that this is also something that people see when they go out of the circle. That is, not only can you see that he can answer a variety of strange questions asked and that most of the time, which I think everyone has seen this before, said I first asked him a question, and then I asked a follow-up with another follow-up, and then ChatGPT can very well understand what question I would like to ask, accurately determining the intention of the person asking the question. This is actually a very big difference from many other traditional models or models of artificial intelligence. As for the underlying reason why this is different, let me briefly explain that, in fact, the main thing is that many models in the past do a search by relevance. You may actually read another article which is written by ZhenFund, which is about the introduction of birds in winter migrating to the south, because the winter of the north is very cold, which is its training set, but then, when you ask why birds migrate to the south in winter, the model will give you a reply that it is because the winter in the south is very cold, so this is a rather pointless answer provided by AI. When it comes to ChatGPT, because it uses a reward model, it has a very strong ability to guess your human intention. With the addition of huge data and a better algorithm, it gives it such a strong interaction ability.
And then I’d like to say a little bit more about the difference between this reward system and the traditional one, no matter what the model is, training requires a lot of data, and often these large amounts of data require markers to participate and input them manually. This traditional model, which is used before the reward model in ChatGPT, is that it already has the ability many other artificial intelligence models have, which is to generate responses in the form of text. But then, after generating a response, he will put this answer with the answers provided by human participants for preference sorting. The question is, how do they sort them? In fact, the sorting is based on the answer from a large number of human participants for the same question, and those answers will be sorted independently from each other. For example, I would give the answer that three is greater than two and two is greater than one, but say another person, his/her answer may be two is greater than three and three is greater than one, and then when the AI is learning he will refer to the ranking of all its content, and then slowly he will know which words or which answer may be the answer that humans want to hear. This kind of logic combined with a large amount of data, is one of the reasons why its interactivity is particularly strong. I’ve said a lot, but I think this will make it clear why this thing has such a strong ability to stand out among the others, which is the fact that its interactivity has been unprecedentedly improved.
Erica: Hi everyone. Well, actually, I think the first time ChatGPT started to go viral was when we found that there were a lot of screenshots of people using ChatGPT to have conversations in our feeds, especially well-known people, such as Elon Musk and Vitalik who were using it as well. When we started to try it, I think one of the biggest advantages of ChatGPT is that it is not just a chatting software. In fact, we used ChatGPT to write a research report about AIGC. When I wrote the research report, I was wondering if I could use ChatGPT to let it write a report on AIGC and it turned out that it was indeed possible.
I think the most important thing for human beings in the future is to learn to ask questions, especially important questions. After giving AI a fixed framework, it is then equivalent to a huge search engine, including some material aggregation and this integration, it can actually replace, for example, 50% of the researcher’s workload, which looks very enticing to me or to most of the researchers.
I think not only researchers, for example, editors and writers, can actually help solve a large part of their work by entering some keywords. In addition, we also found that it actually has a lot of room for imagination. After I asked ChatGPT to write this AIGC report, I also asked for a name and it gave me two names – which you will find that all your work can basically be done with it. It can also produce a decent picture using AI if you provide enough information. So I think it’s very attractive, compared to the previous chatbot. I think that ChatGPT will stay viral because it can actually help us solve some problems.
Red Army Uncle: My argument may have to be a bit more exciting because I think this common description no longer satisfies my view on this stuff. First of all, my first title is called wow, which is the word I used for the first time in my feeds, and the keyword of the title I shared is called disruptive product that will change the world, but from my point of view, people using it don’t seem to have this feeling, they even keep testing the boundaries of the product to verify the view that this thing is nothing more than that. When we talk about blockchain, we think that we are talking about production relations, and then we say ChatGPT is actually a major improvement in productivity. Previously, many of the AI were basically a toy, which really can’t help us solve practical problems, but this time ChatGPT essentially actually helps us do specific things. If this segment still lacks the means of production, then the means of production is actually data, that is to say, in order for this AI to achieve victory, it would require huge data, essentially this is a great breakthrough for it, originally also the results of the data brought. So this time I think this has a major impact on the whole. The second we ask what ChatGPT is, there are a few keywords I would like to give, I think first of all it is a superb personal digital assistant, it is able to help you solve specific problems, and then it is also similar to a scaffolding which can help you quickly extract out a summary of things. It can also be your secretary as it is relatively humane, it is not like some previous AI can give you Information oppression, it can certainly be based on your needs iterative, slowly release the necessary information, rather than stuffing hundreds Gigabytes of data to you all at once, which we could not digest them in a short amount of time for sure, because we are all normal human being.
Then we look at the previous Siri, Wikipedia, Google where we have repeatedly compared ChatGPT with them in this process. I think its essential victory lies here, the previous products they do not have a continuous concept of dialogue, that is, all the answers are given at once, concluding the conversation after giving back the response, this is a static question and answer. But now Chat’s dialogue process will not be very tiring, because in fact, this is the interaction between our people and AIGC, there are complementary and corrective. We can tell the AI what you said is not correct, and give him feedback on what is correct. Even for a certain answer in which I think what you said is not enough, I can add it for you. This time the AI is essentially a continuous dynamic iteration, I think this process is where the AI presents vitality.
The most intuitive application at the application level, I think, is at the code level, programmers and even junior programmers will be able to get a big upgrade or can even copy the code and use it directly, this thing is really very impressive. It’s very common for our community members to use this tool to make something directly for their own work. For example, one of the members gave it the general framework of a project he wanted to operate on, let it make some suggestions, and finally, after its guidance, a very perfect operational plan was formed. I think this also makes us feel very surprised.
The last thing about the mechanism is that I saw data that I found quite interesting, saying that the mechanism behind it is actually the upgrade process of GPT, from the beginning of GPT1, also known as GPT, to GPT2, it has a large number of parameter upgrades at each stage, and the latest is to GP3. It shows that the magnitude of the parameter breakthrough is 10 times the growth of the previous version data-wise.
Longman: All other guests have mentioned the advantages of ChatGPTD, and actually, we can see that it is very similar to an information assistant, which can do a lot of things and I would like to emphasize that it is actually a field-open thing. When we were developing natural language processing, we often did better in a vertical field, that is, it only did better in a small field, such as the e-commerce field or the security field, but when it was required to answer all kinds of questions, it was more difficult to have this model in an open field.
ChatGPT is of very high quality, and we think its content is self-consistent, that is to say, he can solve many common problems, and for some of the questions we often ask, he can give a very self-consistent answer. From the programmer’s point of view, its code editing and error correction ability is far beyond our imagination. It can help programmers to free up a lot of productivity and do a lot of new and creative things, which I think is a very promising thing.
The last point I think ChatGPT has changed everyone’s perception of artificial intelligence, we used to think that artificial intelligence is often artificially clueless, but for now, artificial intelligence is finally showing some intelligence.
Timtim: I majored in high-performance supercomputing, graduated in Edinburgh, and then now I’m in the development team at Binance.
Our school mainly focuses on NLP natural language processing and supercomputing. The competition for NLP natural language processing is very high, and so is the part of AI, and with such fierce competition, its technical progress is actually very substantial. From the version 1 of GPT to the current fourth version, I think the popularity of GPT is not without reason, maybe today more non-technical people see this craze, but for the technical people’s point of view, the code generated is very outdated, like for example, if I ask it to write a contract, it is using the so-called adversarial version of 0.5, because the data set used is only up to 2021. But from its overall development, it has a lot of potential for future development, and we need to keep an eye on its development, as it has the opportunity to help us do something that would otherwise require a lot of manual work. For example, I saw someone on Twitter using it to keep talking and finally let it help make a VentrueDAO, which I think is a very promising thing, and many people have silently started to use it to do some work. Now some big companies have explicitly forbidden the use of openAI to get some answers, in fact, we can see that it is still a certain impact on our industry, but the impact may not be as big as we think, nevertheless we should not underestimate the impact it can bring.
Coral: So let’s move on to the second question, which some of you may have mentioned some points just now, is what is the significant improvement in ChatGDP compared to the original artificial intelligence that we have right now?
Qizhou: I think the point of improvement is mainly to understand the intention of the questioner. For example, when I ask the first question, I may not get the answer I want, and then I can keep describing the question more clearly through the feedback of multiple interactions so that ChatGPT can understand the intention of the questioner more deeply and can reply with an answer we have in mind. From our perspective, developers will discuss with many communities where this is an inevitable process. When you talk to it, you feel that you may be communicating with a real person who is using his knowledge base to understand the question, deduce your intention, and give you the answer you want.
Many times people mock artificial intelligence being clueless because we can use some very simple questions to be able to get funny answers. There are some AI experts, who set a lot of traps, especially in the field of non-public knowledge above some questions and logical thinking, in fact, ChatGPT will still fall into these traps too. It can still pass the Turing test, but it still can’t pass the test for some cold knowledge, so I think there is still a lot of room for improvement here. I also look forward to the fourth or fifth version with more advanced features included.
Mai: I think the core thing is this human feedback system that I also mentioned earlier, and this thing is not actually proposed by ChatGPT, but the iteration of this thing can really help the model to improve in all aspects, and this is what I expect.
Erira: Compared to other AIs, I think one of its biggest features is that it uses a very large model, because it is also a natural language processing system, so I actually asked myself about the difference between it and Google’s transformer.
First of all, it uses 175 billion parameters, while Google only uses 60 billion parameters, this number difference is actually three times. The extra parameters really help ChatGPT to achieve a better state in giving us a reply including its understanding of our semantics. In addition, you will find that it is still quite positive while having communication with it as it is willing to accept some feedback and adjust the answers accordingly. For example, when I find the answers provided are wrong, or I think it has some answers that I am not satisfied with, it can also make some adjustments according to my needs. Compared to other AI, it is a relatively large feature.
Red Army Uncle: First of all, I think there are a lot of similarities between the way ChatGPT reports and the previous AI. Of course, like Alpha Go, it is also equivalent to a continuous process, but this time I am more impressed because the process of dialogue is fully reflected. This is a continuous process and it is no longer a simple question-and-answer process.
Secondly, I think that AI really has the ability to think this time, saying that thought may not be qualified, but I really think it has a certain ability to think. I even think it has mastered the basic methodology of the underlying kind of thinking. When we talked in the community, we even felt that it seemed to have mastered the principle of first nature. To give a few specific examples, for example, the ability to switch between different languages is very natural, it may still be 0 and 1 at the underlying level, but at a more advanced level it can switch between php, java, and other languages at will, it is really not very clear by what means it is achieved. The objective of the underlying layer is to maintain the essence of one thing, but it is changeable in different ways on the top level, and it can give you whatever you want, so it feels very flexible. This process makes you feel as if it has mastered a universal ability, so this leads to the following point is that it is a general artificial intelligence, that is, it is not limited to a certain vertical field, we now feel that it seems to be a bit omnipotent, or we have formed such an illusion about it.
Longman: Back then when we were working on natural language processing work, we actually started with the rules from this kind of decision tree, and at that time, where most of the time, many of the features were set and selected manually. Following the birth of ChatGPT and deep learning, these features can actually be extracted automatically, which means it can do some supervised training to form an understanding of the input text, forming a unique understanding just like what Red Army Uncle just said, and it is actually a model of this neural network that is trained internally. Each phrase is an expression of 0 and 1, which is a string of numbers, this way you can turn the text into a vectorization. After it becomes a vectorization, its understanding is more towards first principles thinking, this way it can constantly switch between different languages, and undergo deep learning to form this big model now. It is now solving a lot of sequential problems as it is contextual, unlike before, a lot of things involve a single question and answer process, it is a continuous recursive process. It is sequential learning, so it can understand and analyze many things. It is a process from understanding to analyzing to generating. Therefore, it can be regarded as a new breakthrough in cognitive intelligence.
Timtim: It gradually progresses in continuous interaction, and if you keep going from a simple problem to a problem overlay through your training, you can end up with a result that you are looking for. It is a very successful result of the combination of artificial intelligence NLP or natural language processing over the years. How it will go, I think this is what we should really focus on. Also, we will see if it should be more intelligent where it eventually replaces humans to do certain work.
Coral: The third question is close to the theme, which is about data, not only that the host of this space is a co-built DAO in the Arweave ecosystem or Dr. Zhou from ETHStorage, we will all be involved in this area, which is data. The third question will be rather long, the question is what is the path and basis for the realization of artificial intelligence to achieve adequate intelligence? How important is it to have enough data samples for AI? Under this premise, is persistent data a one-time training tool or a path that can be repeatedly called and left to human beings to interpret the data?
It’s indeed an interesting question, I’ve been in the industry for a few years, and I often get people talking to me about how to combine AI and web3, like smart contracts and even some storage solutions.
First of all, in terms of training artificial intelligence, they require a huge amount of data, but the training process is still very centralized, which is actually a whole different paradigm from the community participation, distributed and decentralized that we advocate in web3. How to generate interesting applications like ChatGPT through our interesting technologies, whether it is zero-knowledge proof or others, and then combine them with our large-scale decentralized storage capability to produce interesting applications like ChatGPT is a topic worth thinking about. In particular, ChatGPT has a very large dataset, and we expect this dataset to keep increasing in the future, so is it possible to use ChatGPT to replace search engines like Google. A key issue here is that ChatGPT needs to do a retraining process every time it gets the data, but Google, their entire search engine, is able to get the instant network data very quickly. Obviously, artificial intelligence would need time for training when it gets new data before it can go on to answer questions. So is it possible for us to use large-scale distributed storage technology coupled with the decentralized computing power of web3 to better solve some of the problems of fast training, I think this is very good thinking as well.
Erica: While exploring the GPT lately, we were actually constantly testing it, and we’ve found that its answers to some questions are seemingly serious but they are actually nonsense. For example, what is 1/3 + ⅕, it gives us 2/1 + 2/1 = 4/1, 1/3 + 1/5 = 4/8, that is, 2, which is actually a very outrageous answer, and in this case, I think the problem lies in his algorithm or the parameters. In addition, when we were testing it, we also found that the effect of Chinese or other smaller languages and the English language itself is not comparable, so why? I think essentially it is a problem of the training data set, because most of the data on the Internet is in English, so it will be more perfect after the training of English data. When the data level of our training is getting bigger and bigger, or if our training is getting more and more accurate, then we also have a big demand for the computing power, at the same time we will also consider the problem of the chip because nowadays the Nvidia chip is widely used. In the future, I think there may be a trend, that is, the trend of immediate power.
Uncle Red Army: In my humble opinion, everyone mentioned a big concept about this data. Well, from my understanding, there are several levels. On the one hand, it is a parameter that you mentioned earlier, and the amount of data is relatively large. I actually would like to raise this question here, is the parameter a future trend, similar to Moore’s Law, the bigger it gets, the more powerful it is? Does it have an upper limit or bottleneck? The second aspect is the data set. As you mentioned about this data set, the larger the data volume, the more accurate it is. But I do have another problem that our community mentioned, that is, when you ask a question at different times or when a question is asked by different people, it seems that the question has changed and the answer changed.
During the discussion, it was said that the AI is correcting itself, so here comes a new question, that is, if the AI assumes that it corrects itself through human feedback, for example, if we tug-of-war, there are even three directions of tug-of-war, then in which direction does it go? Who does it depend on when it develops? If you give it a right answer and a wrong answer, will the AI prefer the right answer or the wrong answer?
The third question is about the timeliness of data. Assuming that AI develops to a certain extent in the future, should we set a time limit for AI? Assuming that AI’s learning ability surpasses that of humans, will it fix the problem as soon as it finds it, but if human beings have not yet reacted, is this a potential factor that threatens human beings?
Just now, Dr. Zhou also mentioned the topic of the relationship between artificial intelligence and web3. The founder of openai follows cz on social media, which may have something to do with this blockchain, but what other connections might there be? In fact, a few years ago, some of our community partners were discussing the issue of whether AI should be developed in the blockchain field. At that time, everyone had a heated discussion, yhere is a view that today’s AI can achieve such great results because it has greater computing power, bandwidth, computing performance, unique data, and the availability of these data seem to be able to better serve everyone and produce a good result. So, in the public data in the web3 world, can it produce such a similar effect that makes everyone excited. In fact, I am looking forward to the world of web3 to provide enough things, but at the moment I have some doubts in whether the direction of the integration of web3 and artificial intelligence can exist, and I would like to ask this question as well.
Longman: Just now Uncle Red Army has raised a few questions, and I would like to talk about them from my point of view. The first question is whether the more model parameters, the better. My understanding is not necessarily, because in fact, if there are more parameters but not enough data, there will be a state of under-learning, which means that you cannot learn well enough. If you don’t learn well, then the more parameters you have, the more parameters you have may not be useful. The other question is that he just mentioned how to judge which result will be better, so in fact, now ChatGPT has introduced a supervised learning model, and someone will tell it what the objective function is through annotations. It can be learned in several ways. The first one is that it can learn which is more appropriate through our current human text, in pure text, because we will leave a lot of data on the network, which is a process of learning. The other one is feedback, that is, you can also see that there will be a feedback button on ChatGPT. Through this feedback loop, it can make this model better and better. After answering these two questions, I would like to talk about the impact of this permanent data Arweave on AI, because my research on Arweave has been about a year. I would like to talk about a few points, I think this Arweave is an infrastructure for the evolution of safe robots in the future. Why do you say so? It is a public chain of data permanent storage, that is, it can accumulate a lot of AI training data forever, and this data cannot be tampered with. Only in this way can the data used by robots be trusted, otherwise future robots there might be many unpredictable problems due to the damage or interference of its training data. Even today, we also saw some friends in this circle asking about ChatGPT, that if there is no interference and no restrictions, what it might do, and the answer is to destroy human beings. Although this answer sounds very frightening, it is also actually very possible. If it is not on the blockchain network to save this data, then its data may be tampered with, and its model may be tampered with, but if we put its model and its data on the chain, then we can see the AI model openly and transparently, how it works , what version it is using. To prevent AI from messing around, we need to allow AI to have sovereignty, but its sovereignty should be established on the blockchain, not to have this sovereignty on an uncontrolled Internet, which is also a very frightening thing. So I think that Arweave is actually very suitable for storing AI training models and AI training data. There is another very important point about Arweave, which is also very exciting to me, that is, it proposes a concept of scp, which is a smart contract. Arweave’s smart contract is a smart contract of off-chain computing, which is different from other smart contracts. This is a completely different approach from other smart contracts, only such smart contracts can support a very large amount of computing. We currently see about 7 TPS like Ethereum, such TPS is basically impossible to do AI computing, but with arweave’s computing-oriented ability, it is easy to go to the edge of the expansion, it is possible to do a very high performance of this computing, this is a very good support point for AI. Arweave smart contract supports multiple languages, not only solidity or move it even supports Java or our custom python, python language is the programming language AI is using now, Arweave has the ability to support mainstream AI programming language, so I think arweave will have a very big support role in the field of AI in the future. It will be a very important infrastructure for the evolution of our robots, which can help AI to own and buy data, and help AI to build a truly controllable model that can serve human beings. And this is one of my thoughts.
Coral: Thank you Longman, this answer is awesome. I am so excited not only because PermaDAO is a co-built DAO of the Arweave ecosystem. In fact, I have thought about this issue carefully before and I have been discussing a question with my friends, in order to train a super-smart AI, you actually need to feed it a huge amount of data, but when this super-smart AI is no longer restricted by humans, should you have thought that you need to feed these data to make a permanent storage, so that it allows human beings to retain the power to evaluate or judge these data. The era of AI will come sooner or later. What matters is how humans respond. After all, AI is fed by humans with data.
Tim Tim: We have discussed a lot of points, and I would like to respond to the question of computing power, if you want to do a large amount of text of a data analysis and collection, and then go to the tagging, you will need supercomputing for sure, supercomputing to support the AI series of activities is inevitable, so that this computing power is not to be estimated. For example, our school has two supercomputers, which is considered one of the better universities in Europe, our tutor is also specializing in openMP, and the direction of MPI, and then 70% of the computing power of our university supercomputers, all of them are helping the British government to do a simulation of the new crown and the policy strategy. So we can see that a large amount of data must be dependent on powerful computing power, for example, like a million CPUs or a million A100 graphics cards to help it, otherwise if you take the standard of ordinary computing power, you are likely to take light years to calculate.
Regarding the data about Arweave that Longman just said, since the research direction of my final project is copyright, I used Arweave as my storage chain. I think the idea it proposes is very good, but there is one problem, the Arweave blockchain is an open world, it is transparent, and then we can look at it, because it is possible to trace the data back to who exactly it belongs to. But this intersection needs to be supervised, because it is permanently stored and cannot be tampered with. If there is any noise in it, it will also greatly affect the final result of the AI learning, so it is important to control this junction, because we don’t want AI to learn a lot of bad data, and then affect the final analysis results.
Coral: Thanks Tim, in my opinion, this is also a very important point for the future development of Arweave. How to ensure the objectivity of data to the greatest extent in a neutral approach is a very important proposition.
Timtim: I would like to add one more point to this one. Because I work on text-based copyright protection, so I went to verify this matter a few days ago on ChatGPT where I asked which modules I need to do my project, the answer I’ve gotten is that I have to do modules on verification and protection system, and then I have to verify whether my data is original. During the early days of my doctorate, I was studying the case of a combination between blockchain data and supercomputing, but then I stopped halfway because I think it is too tiring, but I believe that this must be a direction to go in the future. The blockchain is very young and is only around 10 years old. Note that 1.5 billion text volume is provided for openAI for training, which contains a lot of our past data, while we continue to progress in this industry, we will create more data volume, and then the future of AI and the combination of data on the chain will eventually become a trend, because in the end, we are of different disciplines, whether it’s physics, humanities, or linguistics where all will be related to data analysis. It is a good thing that Arweave has a tag system, which helps us to save some workload as each person has already tagged themselves when they upload.
Zhouqi : When we first designed ETHStorage, we drew on the idea of Arweave to store the data forever, and then thought about what we needed for future large-scale storage while maintaining or even upgrading the quality of the persistent data. So when we were working on ETHStorage, our idea was to allow this data to be stored forever, but also allow this data to be deleted or modified according to our contract logic on the chain. For example, I now have two sets of data, one of which is of higher quality than the other, so I allow the data to be stored in a decentralized way, but I can still use DAO as a form of organization to govern the data on the chain, such as giving scores for the data to improve the quality of the data, all of which can be achieved by means of smart contracts. Arweave’s data can only be owned by one person or an organization, while the data can be owned by a smart contract, so whether the data can be deleted or not is solely based on the contract. Also, we can use some artificial intelligence to do some off-chain governance, and the results can be verified on-chain using ZK’s decentralized approach. At the same time, we can use token incentives to encourage everyone to do honest feedback, and if these data can improve the identification rate, then we can donate or reward all participants via different ways who provided these data. Through these calculations of the storage of Ethernet, especially the programmable storage, we can have more room for exploration in the future application scenario of persistent storage.
Coral: Thank you for coming today, I believe all of us have learned a lot today and I look forward to more exciting events in the future!
Join PermaDAO, Bulid Web3, Let’s do it!
Translator：John Khor @ Contributor of PermaDAO
Reviewer：Krollie @ Contributor of PermaDAO
Telegram / Discord / Twitter