Critical issues presentations/Detecting Copyright Concerns in Near Real Time
Detecting Copyright Concerns in Near Real Time
- Submission no. 29
- Title of the submission
- Author of the submission
- Country of origin
Canada
- Topics
Outreach, Projects, Technical
- Keywords
- Copyright
- WikiProject Med Foundation
- Wikimedia Israel
- Collaboration
- Abstract
An ongoing problem is people adding copyrighted material to Wikipedia. When this occurs and it is not rapidly removed it puts our shared brand at risk. Often it is done not with malicious intent but simple due to a misunderstandings of copyright. Occasionally editors in obscure topic areas make 10 of thousands of concerning edits before the issues are noticed.
We began discussing potential technological methods to assist the editing community in addressing this problem back in 2012. A partnership was formed with Turnitin (basically they agreed to give us access to their API without charges).
At Wikimedia in London a community programmer (User:Eran) joined our team and they hacked together a simple bot. Since that time we have been working to improve the bots functioning and develop a community to address its output.
This presentation will not only discuss the internal workings of the bots for the technical crowd but the efforts to develop a community. The accuracy of the results will be discussed and possible methods to improve them. Additionally there's the possibilities to get this bot up and running in languages other than English.
- Result
Accepted
Interested attendees and comments
Interested attendees:
Slides
The presentation slides are avaliable here and are under a CC BY SA license.