Critical issues presentations/Detecting Copyright Concerns in Near Real Time

Detecting Copyright Concerns in Near Real Time

Video

Submission no. 29
Title of the submission

Author of the submission

James Heilman

Country of origin

Canada

Topics

Outreach, Projects, Technical

Keywords

Copyright
WikiProject Med Foundation
Wikimedia Israel
Collaboration

Abstract

An ongoing problem is people adding copyrighted material to Wikipedia. When this occurs and it is not rapidly removed it puts our shared brand at risk. Often it is done not with malicious intent but simple due to a misunderstandings of copyright. Occasionally editors in obscure topic areas make 10 of thousands of concerning edits before the issues are noticed.

We began discussing potential technological methods to assist the editing community in addressing this problem back in 2012. A partnership was formed with Turnitin (basically they agreed to give us access to their API without charges).

At Wikimedia in London a community programmer (User:Eran) joined our team and they hacked together a simple bot. Since that time we have been working to improve the bots functioning and develop a community to address its output.

This presentation will not only discuss the internal workings of the bots for the technical crowd but the efforts to develop a community. The accuracy of the results will be discussed and possible methods to improve them. Additionally there's the possibilities to get this bot up and running in languages other than English.

Result

Accepted

Interested attendees and comments

Interested attendees:

Slides

The presentation slides are avaliable here and are under a CC BY SA license.