Approval Queue and User Verification
Mikey
Moderator
Pone of Astronomy
@☬ lincolnbrewsterfan ☬
Please try to keep posts related to the topic^^
Please try to keep posts related to the topic^^
Background Pony #F197
The queue is needed so that staff members can make sure that no IRL gore or child abuse imagery can ever see the light of the day. We only check if the image blatantly breaks Rule #5, nothing else.
From my days on numerous forums, 100% manual approval is a really, really bad idea, for both users, site and moderators. How about a compromise: image sits in queue 12 hours, and if moderators don’t react, image will be automatically published without approval?
For all newer users, once you reach a certain amount of approved uploads, you will be reviewed by the staff members and be granted verification.
Again, manual stuff is a bad idea, users will be missed (or “missed”) all the time, we already know it to be objectively true from the whole “manual badge granting” thing. Number of published uploads and account age is enough for auto-approval, no manual “review” is needed. Maybe check for sum of all scores of uploaded images if you want to be thorough.
So, we agree on compromise?
koschyy
@The Smiling Pony
I don’t know for sure what will work, but I believe that it should at least be experimented with.
I don’t know for sure what will work, but I believe that it should at least be experimented with.
In addition, modern machine learning tools have built-in methods to correct for class imbalance and adjust the margin of error that is tolerable. However, there are cases where too little of a tolerance for error would cause algorithms to overanalyze the data and generate edge cases for every single outlier.
Many of these tools are also widely available and open source. The one I am using right now for work is very versatile and adaptable to various needs for both classification and regression.
It is likely impossible to have an ML system be 100% accurate, but I think it would beat having to have humans who have to eat and sleep on occasion monitor the system and approve of every last request. Plus, a degree of human error could also exist in approving the posts and site users here may get impatient if it takes hours to process their requests.
GlitchedWolf
Moderator
Derpbooru Free Edition!?
@Background Pony #F197
It isn’t 100% manual, whitelisted users don’t go through approval queue.
It isn’t 100% manual, whitelisted users don’t go through approval queue.
The Smiling Pony
( ͠° ͟ʖ ͡° )
@koschyy
The thing is, humans can be trusted fairly well to be able to look at an image and say “yep, that’s genuine article CP” and nuke it. And given the subject matter, any false-negative is a huge issue.
The thing is, humans can be trusted fairly well to be able to look at an image and say “yep, that’s genuine article CP” and nuke it. And given the subject matter, any false-negative is a huge issue.
As well, false positives would lead to the same issue of people being impatient about their upload not showing up. To note, again, that upload approval only applies to people that aren’t logged-in, and to new accounts that haven’t been whitelisted; the vast majority of uploads on the site come from a small number of users and a long tail of existing users (posting as-anon or not), with logged-out and new users being a small minority of daily uploads.
And fairly importantly… how exactly would we train and test a system to detect and flag CP, without using actual CP? We’d need a fairly large amount of varied, high quality, real data… And I think you can see the problems with that.
Background Pony #F197
@TheGlitchedWolf
In all cases of where approval is needed, it’s all 100% manual, including the whitelist itself. I suggest auto-approve in cases where the posts are neglected for too long.
In all cases of where approval is needed, it’s all 100% manual, including the whitelist itself. I suggest auto-approve in cases where the posts are neglected for too long.
The Smiling Pony
( ͠° ͟ʖ ͡° )
@Background Pony #F197
Any automated system can be easily gamed.
Any automated system can be easily gamed.
The system would only impact people uploading without an account and new users. The former can just accept that sometimes their upload may take some more minute to show up, and the latter will eventually get whitelisted (and there’s an audited queue for that, if you want to check the code repo… it’s nothing at all like badges).
The process is also meant to exclusively have staff check only if an image is a clear R#5 breaker or not; anything else gets approved and can be moderated later if need be, even if egregious for whatever reason. It’s important that this system not get bogged down with “this might be a DNP or #0 or whatever so I’ll let it sit while I check”.
koschyy
The thing is, humans can be trusted fairly well to be able to look at an image and say “yep, that’s genuine article CP” and nuke it. And given the subject matter, any false-negative is a huge issue.
And therein lies the problem. One particular MLP image site whose name I would not like to mention for ethical reasons once had an issue where a CP thread was left up for five or so hours because the mods fell asleep. There is also the possibility that human moderators on any site become corrupt and accept bribes to leave otherwise forbidden concept on the site or that an ambiguous case goes the wrong way.
As well, false positives would lead to the same issue of people being impatient about their upload not showing up. To note, again, that upload approval only applies to people that aren’t logged-in, and to new accounts that haven’t been whitelisted; the vast majority of uploads on the site come from a small number of users and a long tail of existing users (posting as-anon or not), with logged-out and new users being a small minority of daily uploads.
Perhaps a human review process could occur for false positives, just as they do on YouTube. The training data could then be updated to reflect the true results.
And fairly importantly… how exactly would we train and test a system to detect and flag CP, without using actual CP? We’d need a fairly large amount of varied, high quality, real data… And I think you can see the problems with that.
Apple is using the Amber Alert register to try to detect missing children and another software called iCOP uses the Gnutella network and a service called Microsoft Photo DNA to help examine image signatures without necessarily having to get people to look at the images themselves. Something similar could perhaps be done here.
Background Pony #F197
@The Smiling Pony
I don’t mean an AI or something. I mean give approval queue a hard deadline of 24 hours so it will get forcibly approved and published when time’s up and moderators didn’t react in time.
For user accounts, let site engine to whitelist them when conditions are reached (age, upload count, comment count, uploaded images’ score).
I don’t mean an AI or something. I mean give approval queue a hard deadline of 24 hours so it will get forcibly approved and published when time’s up and moderators didn’t react in time.
For user accounts, let site engine to whitelist them when conditions are reached (age, upload count, comment count, uploaded images’ score).
koschyy
@Background Pony #F197
I doubt something like that would be allowed if there is a risk of the site being shut down. If you read my previous comment, then you’ll find out that there may be issues with such things being implemented.
I doubt something like that would be allowed if there is a risk of the site being shut down. If you read my previous comment, then you’ll find out that there may be issues with such things being implemented.
Background Pony #F197
there is a risk of the site being shut down.
If moderators don’t react in current system, the site is already effectively shutdown. No new users, no new uploads.
GlitchedWolf
Moderator
Derpbooru Free Edition!?
@Background Pony #F197
We’re not going anywhere, there will still be new uploads, especially with the whitelist system.
We’re not going anywhere, there will still be new uploads, especially with the whitelist system.
@Shimmering Spectacle
That, and new users.
That, and new users.
koschyy
@Background Pony #F197
Which is why I advocate for machines doing the work, though it is possible that humans may be necessary in the early stages of its deployment to make absolute certain that everything goes well.
Which is why I advocate for machines doing the work, though it is possible that humans may be necessary in the early stages of its deployment to make absolute certain that everything goes well.
The Smiling Pony
( ͠° ͟ʖ ͡° )
@Background Pony #F197
I’m not sure what scenario you’re envisioning where no mod checks the “did someone upload illegal content?” queue for long enough that it causes a notable downturn in the number of new images…
I’m not sure what scenario you’re envisioning where no mod checks the “did someone upload illegal content?” queue for long enough that it causes a notable downturn in the number of new images…
It isn’t a moderation/report queue, or even a “are the tags right” queue; it’s an immediate check only to see if an image is CP/IRL gore or not, and it doesn’t affect that many images. There isn’t going to be a sudden decline in the online availability of staff to check stuff, this just helps in being able to block illegal shit before average users get exposed to it, at the cost that maybe 10% of daily uploads appear 5-15 minutes late instead of the normal 2.
koschyy
@The Smiling Pony
In order to make absolute certain no users get inconvenienced and no illicit content makes it through, they would need to either work in shifts that could be somewhat inconvenient or operate across time zones.
In order to make absolute certain no users get inconvenienced and no illicit content makes it through, they would need to either work in shifts that could be somewhat inconvenient or operate across time zones.
AppleDash
Moderator
Purple Pega
@koschyy
Well, good news for you - we have mods in many time zones who regularly use the site, and they have been around within minutes to handle all past cases of direct CP uploads before this queue was put into place. Right now, the queue is empty, and the two times I’ve seen it sitting at 1 image I approved it in about 2 seconds.
Well, good news for you - we have mods in many time zones who regularly use the site, and they have been around within minutes to handle all past cases of direct CP uploads before this queue was put into place. Right now, the queue is empty, and the two times I’ve seen it sitting at 1 image I approved it in about 2 seconds.
Dragonpone
Moderator
Badge Dragon
@AppleDash
I was just about to reply I hadn’t gotten to do any, and right as I was typing I got to approve my first one.
I was just about to reply I hadn’t gotten to do any, and right as I was typing I got to approve my first one.
Yippee!
Ciaran
Senior Moderator
君場森生きる
@Dragonpone
Yeah, once you do a few it’s pretty fast. It’s like the old ‘Ham or Spam’ queue. See a thing? Click a thing. Done.
Yeah, once you do a few it’s pretty fast. It’s like the old ‘Ham or Spam’ queue. See a thing? Click a thing. Done.
koschyy
@Wiimeiser
@Ciaran
I believe it may still be at least a good idea to try and experiment with the idea, especially since human approval could also be prone to similar errors. I think that acquiring more training data and tweaking the parameters of the algorithm itself could yield better results, at least based on my own experience in the field, but a hybrid system like YouTube’s would probably also be a good idea to make sure everything gets off the ground smoothly.
@Ciaran
I believe it may still be at least a good idea to try and experiment with the idea, especially since human approval could also be prone to similar errors. I think that acquiring more training data and tweaking the parameters of the algorithm itself could yield better results, at least based on my own experience in the field, but a hybrid system like YouTube’s would probably also be a good idea to make sure everything gets off the ground smoothly.
AppleDash
Moderator
Purple Pega
@Wiimeiser
It’s worth noting that the reverse search doesn’t use anything approaching machine learning, it’s just a dumb quadrant intensities search.
It’s worth noting that the reverse search doesn’t use anything approaching machine learning, it’s just a dumb quadrant intensities search.
Interested in advertising on Derpibooru? Click here for information!
Help fund the $15 daily operational cost of Derpibooru - support us financially!