Authorship Attribution Benchmark issueshttps://git.uibk.ac.at/csak8736/authbench/-/issues2021-06-30T14:06:57+02:00https://git.uibk.ac.at/csak8736/authbench/-/issues/6Check why the cl-1 novels have so many more chunks per author than cl-22021-06-30T14:06:57+02:00User expiredCheck why the cl-1 novels have so many more chunks per author than cl-2![datasets](/uploads/7c721ff5667e21a73ab0a6ba6e8f714c/datasets.png)![datasets](/uploads/7c721ff5667e21a73ab0a6ba6e8f714c/datasets.png)https://git.uibk.ac.at/csak8736/authbench/-/issues/5Leave-One-Language-Out is not suitable2021-06-30T14:59:19+02:00User expiredLeave-One-Language-Out is not suitableIn the Novels datasets, the "leave-one-language-out" strategy is not suitable, as it leaves out entire authors who have all novels in common.
However, "leave-one-novel-out" will lead to unnecessary duplicated trainings.
Instead,In the Novels datasets, the "leave-one-language-out" strategy is not suitable, as it leaves out entire authors who have all novels in common.
However, "leave-one-novel-out" will lead to unnecessary duplicated trainings.
Instead,User expiredUser expiredhttps://git.uibk.ac.at/csak8736/authbench/-/issues/4Fix PAN18 dataset unify script2021-05-26T09:11:50+02:00User expiredFix PAN18 dataset unify scriptThe PAN18 Unify script assigns equal file names to training and testing documents, and has probably overwritten content of one of those categories.The PAN18 Unify script assigns equal file names to training and testing documents, and has probably overwritten content of one of those categories.User expiredUser expiredhttps://git.uibk.ac.at/csak8736/authbench/-/issues/3Check All Dataloaders with check_explicit_splits2021-06-30T13:10:56+02:00User expiredCheck All Dataloaders with check_explicit_splitshttps://git.uibk.ac.at/csak8736/authbench/-/issues/2Complete Calculations2021-05-21T14:10:33+02:00User expiredComplete CalculationsComplete calculations for all corpora:
CPU
* [ ] Reddit (limited)
* [x] Reddit (unlimited)
* [x] Reuters (limited)
* [x] Reuters (unlimited)
* [ ] IMDb (limited)
* [x] IMDb (unlimited)
* [x] CMCC
* [x] Guardian
* [x] Novels
* [...Complete calculations for all corpora:
CPU
* [ ] Reddit (limited)
* [x] Reddit (unlimited)
* [x] Reuters (limited)
* [x] Reuters (unlimited)
* [ ] IMDb (limited)
* [x] IMDb (unlimited)
* [x] CMCC
* [x] Guardian
* [x] Novels
* [x] PAN18
GPU
* [ ] Reddit (limited)
* [x] Reddit (unlimited)
* [ ] Reuters (limited)
* [ ] Reuters (unlimited)
* [ ] IMDb (limited)
* [ ] IMDb (unlimited)
* [ ] CMCC
* [ ] Guardian
* [ ] Novels
* [ ] PAN18User expiredUser expiredhttps://git.uibk.ac.at/csak8736/authbench/-/issues/1German never test language on cl_novels_12021-05-21T14:07:57+02:00User expiredGerman never test language on cl_novels_1"German" seems to be not used as test language, or all results are zero.
![maim_2021-05-21--11-57-23](/uploads/0c4cd5ac88c0c2d8dc1b2d8c96993460/maim_2021-05-21--11-57-23.png)"German" seems to be not used as test language, or all results are zero.
![maim_2021-05-21--11-57-23](/uploads/0c4cd5ac88c0c2d8dc1b2d8c96993460/maim_2021-05-21--11-57-23.png)User expiredUser expired