openai / requests-for-research Goto Github PK
View Code? Open in Web Editor NEWA living collection of deep learning problems
Home Page: https://openai.com/requests-for-research
A living collection of deep learning problems
Home Page: https://openai.com/requests-for-research
Hi!
I recently tried Train a language model on a jokes corpus research and for that purpose I've crawled few jokes sites and trained char-rnn language model on that. However, the output wasn't really funny.
Here's the dataset that I've used. However, please note that some of these jokes are really creepy. I tried to filter some of them out, but... well, it's just crawled from jokes site, can't do anything about it.
OpenAI is βOPENβ but gpt-4 is not even available for free users, even for only
ONE TIME PER DAY
I'm trying to download the dataset for Description2Code and the link provided only has one of the 9000 examples promised.
Hey there,
No content seems to be rendering on the right-hand side of the page at https://openai.com/requests-for-research/ in either Chrome or Safari on OS X.
I cloned things down locally, ran make
, and content seemed to render, so I'm imagining this is something on the server operations side.
A bit of background info about me, I co-founded a startup in the security industry, my main focus is to collect security related data (spams, attacks, ...) to identify infected or in other way compromised hosts on the internet. Therefor we built a sensor network to collect ~500 million security events per day, ~70% of it Spam alone. I did a lot of research based on the data to find out how Spams look like, where they originate from or what mechanisms where used to send them. I'd like to share my findings about Email Spam with you because they might be helpful for your "Spam the spammers" idea.
First I'd like to stress that Spammers send Spam, not because they want to annoy you, but because they want to make money. The business model behind each Spam can be very different. Spams are used to advertise real e-commerce platforms or services, phish for account information, distribute malware or trick you into behavior to for example share information you shouldn't share. So the purpose of Spam sometimes would be to directly drive sales of products where others would be to expand resources. Only a fraction of the different business models of Spam require a reply from the recipient of the Spam and most Spams are designed to provide that reply or feedback through other communication channels than Email. This is because of the very nature of how Spam is distributed. Sending the amount of Spam we currently face require mechanisms very different to traditional Email infrastructures. Spammers don't use they're own infrastructure to send Spam, they use other people resources. They use malconfigured Email servers, compromised accounts, vulnerable websites, infected computers, compromised servers, they get very creative I saw printers and TVs send Spam. Using other people resources often mean you need to impersonate a valid sender to send out your Spam. Impersonating valid senders or using other people services for Spam often mean you can't reach out to the original sender of the Spam anymore. Replies on Spams would end up on real people Email inboxes or would be denied by real Email infrastructure. The protocol used for Spam is purely for getting your content out into the world, not actually to use it in the intended way.
So, as much as I like the idea of Spamming the Spammers I think replying on Spams in a large scale fashion would hurt the wrong people, who are not even aware that they send Spams.
What I noticed in Spam content was that different campaigns often use similar or even the same template. Templates would often use Spintax combined with static blocks of text. Of course there are more sophisticated techniques to generate the Spam corpus, but this applied almost only to highly targeted content. The high volume Spammers don't care about quality they care about volume and throughput. So to kill 90% of the Spam out there template based detection mechanisms should be sufficient. The techniques currently used by the industry to detect Spam on the content level are very primitive I would say, that's why Spam is still an issue. What I would like to see is an approach to detect Spam not on a content level, but on a communication level. I noticed Spammers often use the same software underneath to send out Spam. This software leaves traces in how a Spammer communicate and behave on a protocol level. There are traces in all layers starting from IP, TCP up to the application layer. I would like to see learning mechanisms, which learn how Spammers communicate and kill them before their content hit your inbox.
I don't know where I should put this. It's not a PR because I haven't build anything yet. I want to clarify some points first and lay out my thoughts and procedures before I decide whether to actually build this thing. It is possible that I am completely missing the point of this challenge.
So my questions are:
Don't get me wrong. I am somewhat interested in this problem, and I think it's a solvable one, but I think the solutions I have in mind may not be the solutions you would prefer to see.
I wonder whether you have considered adding project on intrinsic motivation. Here are a few papers that have stood out to me:
I'm currently working in this area and have worked on practical variations of the Causal Entropic Forces paper in the past. Personally, I think that this field has a lot of potential both in terms of practical and theoretical contributions.
For concrete projects, I have several experiments and extensions/generalisations of the above papers in mind, and I'm working on such an extension, but before making this issue longer it would be useful to know whether this research area interests OpenAI.
We should add an "author" section to each problem.
What file/directory should we state solutions in?
We have already worked on dqn+RAM (http://arxiv.org/abs/1605.01335).
A declarative, efficient, and flexible JavaScript library for building user interfaces.
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. πππ
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google β€οΈ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.