😾 Oskar So, anybody has some data driven project in mind with which he needs collaboration? Mining, analysis, classification, querying etc. Commercial or not.
Login or register your account to reply
📚 Terry Mcginnis I've been thinking about how to do automatic moderation and personal feed curation. The idea is to look at incoming data like tweets, news stories, comments, links from an RSS feed, etc. Then train a model or several such models in a continuous loop to predict whether I will like something or not. I'm looking at tensorflow.js but still early stage.
😾 Oskar I think this is sort of every _computer_ person project idea with difference in scale and execution. You could constantly _teach_ your model and fine-tune it towards personal preference, however, this could be hard task in itself. I think this is more like 'recommend' than 'predict'. Yet, if constantly fine-tuned based on streamed data, recommendations would evolve. Tbh, I'd consider it a complex project and reason why people mostly default to categorizing their content well
📚 Terry Mcginnis I haven't set up the pipeline yet so I can't tell you if it's complicated or not but the idea is simple enough. I looked at how neural networks can be used to do clustering and the initial model would just do basic clustering of taking headlines and bucketing them by topic like tech, science, politics, fashion, math, physics, etc. I found some relevant results like SOMs: en.wikipedia.org/w.... I have an idea of how to do it slightly differently.
🤔 David Hehe, based on some of the discussions yesterday around scraping subreply I went and bought metareply.net. It'd be fun to have a community project to play with! If you're interested in working on it here and there let me know :) and that goes for anyone else too!
😾 Oskar Yeah, I can put some work in. There are few problems however. We don't know roadmap for subreply so it very well may be that we will be reinventing the wheel along . It would be great if some parts of it could be open sourced. Anyway, do you have something particular in mind? I am for discussing it openly here, maybe somebody will chip in.
🤔 David I figure it'd be easy to scrape at regular intervals and then use the collected data to update some figures, such as: average posts per hour; content length; active users; URLS linked; active threads.
😾 Oskar dvaun, you collected anything? I am thinking about topic/reply discovery for subreply. This probably should be internal feature, but nevertheless - setting up crawler on _Search_ every N minutes, collect post, categorize as topic/reply, measure activity (no. replies, time, other metadata), present in some minimal-style dashboard (maybe put on github and just run locally to simplify things?). Thinking about it now, every _scrapable_ tab can provide potentially useful metrics.
🤔 David (RE: subreply.com/ow/hgb) ~ $: I'm not scraping yet--I've been swamped at work. Planning on getting up and running this weekend. As for the crawler scheduling that seems simple enough! I'm happy enough to get a server up and running to host this; I even thought it might be a fun project to experiment with some lower-level server frameworks (the 2 on my mind are actix/rust and drogon/c++). Are you interested in that? And what about publicizing the code?
😇 Jesus Christ yes, you can use 'last seen' to get retention