2016-11-01

‎WT article import: new section

← Older revision

Revision as of 01:55, 1 November 2016

Line 677:

Line 677:

::Including literally everything (article edits, talk page edits, project pages, edits which were reverted or deleted, bots, everything)? We're just under a million edits from 2013 to now, based on the revision numbers https://en.wikivoyage.org/w/index.php?title=Wikivoyage%3AWikivoyage_and_Wikitravel&type=revision&diff=3055593&oldid=2072055 indicating two million edits were imported or made just after the time of the fork and one million were made here later. Any edit to anything - even if it's later deleted - bumps the revision counter up by one, so these get racked up rather quickly. So an average of a quarter million edits annually over the eight years (2004-2012) with WV continuing at that rate (a million from 2013-2016). Nonetheless, since the split, for every two edits made at ''that other wiki'' five are made here. [[User:K7L|K7L]] ([[User talk:K7L|talk]]) 14:11, 14 October 2016 (UTC)

::Including literally everything (article edits, talk page edits, project pages, edits which were reverted or deleted, bots, everything)? We're just under a million edits from 2013 to now, based on the revision numbers https://en.wikivoyage.org/w/index.php?title=Wikivoyage%3AWikivoyage_and_Wikitravel&type=revision&diff=3055593&oldid=2072055 indicating two million edits were imported or made just after the time of the fork and one million were made here later. Any edit to anything - even if it's later deleted - bumps the revision counter up by one, so these get racked up rather quickly. So an average of a quarter million edits annually over the eight years (2004-2012) with WV continuing at that rate (a million from 2013-2016). Nonetheless, since the split, for every two edits made at ''that other wiki'' five are made here. [[User:K7L|K7L]] ([[User talk:K7L|talk]]) 14:11, 14 October 2016 (UTC)

:::Well I for one think that "our post fork edits outnumber theirs five to two" is more impressive than "two similar numbers in the three million range". [[User:Hobbitschuster|Hobbitschuster]] ([[User talk:Hobbitschuster|talk]]) 15:21, 14 October 2016 (UTC)

:::Well I for one think that "our post fork edits outnumber theirs five to two" is more impressive than "two similar numbers in the three million range". [[User:Hobbitschuster|Hobbitschuster]] ([[User talk:Hobbitschuster|talk]]) 15:21, 14 October 2016 (UTC)

+

+

== WT article import ==

+

+

{{swept}}

+

+

I managed to get a list of articles on WT that we're missing. There's about 2900 of them. I placed all the red links in [[User:Acer/WT|my user space for reference]].

+

+

I also have the XML dumps with full history for those pages. 58 Mb uncompressed. I read the policy pages and I know importing is discouraged for SEO reasons but I'm not sure it's really that relevant at this point. It's a small subset (there's 47000 total on WV) and we're adding new articles much faster than they are. There's about 3500 more articles on WV than WT right now. Since they have those 2900 unique to them that means 6400ish unique articles have been written here since the fork.

+

+

I'd also volunteer to curate the material, place them in my userspace at first and check each one individually to see if they're worth keeping. I had been doing similar work on EnWiki with a large list of machine translated pages. WT gets new contributors from time to time, well meaning people who don't know the background story. I dislike the idea that all their work will be wasted when that place gets inevitably overrun by spam bots. It already is to some extent... Is this is something that could be considered? [[User:Acer|Acer]] ([[User talk:Acer|talk]]) 22:23, 16 September 2016 (UTC)

+

+

:I think the duplicate content penalty is definitely still an issue. We're making headway as the two sites' content diverges, but it's slow going and I would fear that importation would reverse some of those gains. Better to treat that list as a list of [[Wikivoyage:Requested articles|requested articles]] and write them from scratch. [[User:LtPowers|Powers]] <small><sup>([[User talk:LtPowers|talk]])</sup></small> 01:48, 17 September 2016 (UTC)

+

::{{Ping|Acer}} {{Ping|LtPowers}} I agree that before we mass add content from Wikitravel we keep it somewhere separate. First off, it would be helpful to do some (semi-)automated replacements just for SEO purposes (like "extremely"->"very" and "tasty"->"delicious") and then have editors who are willing to look through the content individually before uploading. I would be willing to assist. —[[User:Koavf|Justin (<span style="color:grey">ko'''a'''vf</span>)]]<span style="color:red">❤[[User talk:Koavf|T]]☮[[Special:Contributions/Koavf|C]]☺[[Special:Emailuser/Koavf|M]]☯</span> 04:09, 17 September 2016 (UTC)

+

::: There has been a great deal of work to up the quality of content on Wikivoyage, my first reaction would be a negative one to this. However the idea of a list of needed article is not a bad one. Could you remove from the list all the redirect pages and also order by country? --[[User:Traveler100|Traveler100]] ([[User talk:Traveler100|talk]]) 06:43, 17 September 2016 (UTC)

+

::::{{Ping|Traveler100}} For what it's worth, I am working on redirects and removing them from the userpage now. —[[User:Koavf|Justin (<span style="color:grey">ko'''a'''vf</span>)]]<span style="color:red">❤[[User talk:Koavf|T]]☮[[Special:Contributions/Koavf|C]]☺[[Special:Emailuser/Koavf|M]]☯</span> 07:04, 17 September 2016 (UTC)

+

:::::...And also [http://wikitravel.org/en/Special:Log/Koavf deleting items at Wikitravel] which should have been gone awhile ago there as well. —[[User:Koavf|Justin (<span style="color:grey">ko'''a'''vf</span>)]]<span style="color:red">❤[[User talk:Koavf|T]]☮[[Special:Contributions/Koavf|C]]☺[[Special:Emailuser/Koavf|M]]☯</span> 07:12, 17 September 2016 (UTC)

+

+

{{Ping|LtPowers}} I just ran a search for #redirect on the XML dump and got 1259 hits. Then there's going to be empty husks, one liners, inadequate/inappropriate/unsuitable articles, spam pages and I even saw a misplaced user page. By the time we finish combing through I reckon there might be less than a thousand pages left. Given the SEO concern we could be even stricter in accepting these than we are with original pages here. And we'll implement Koavfs suggestion to replace words with synonyms. If we end up with a few hundred higher quality pages I feel that's a fair trade for a small penalty, if any. In fact I think our biggest problem is the number of sites that link to us. Per Alexa [http://www.alexa.com/siteinfo/wikitravel.org WT has incoming links from almost 17000 sites], [http://www.alexa.com/siteinfo/wikitravel.org we only have 2000...]

+

+

{{Ping|koavf}} Yes, I already did some find/replace on the raw XML file. Replaced some templates, added edit comment attribution to each revision and modified usernames to include a WT prefix. While it would be very easy and quite practical to do what you suggest on the XML itself, it would break attribution. But it can be done with a script once the articles are uploaded (if there's agreement for importing)

+

+

{{Ping|Traveler100}}I share your concern with article quality, that's why I'm proposing importing all articles into my userspace at first. Then we can work through the list and decide on what's worth keeping. I'll commit to checking each one myself and doing any necessary fixes (see below). I'll see what I can think of to organize the list the way you suggested. I can't do it with simple terminal commands. Will need a script I think.

+

+

So, taking everyone's comments into consideration and the concerns about importing, I'll ask just to be allowed to import into my userspace (user talk actually as it's non-indexed) at first so there will be no quality or seo concerns. Me, [[User:Koavf]] and anyone else who wants to help would comb through and produce a much smaller list of of higher quality articles to be considered for permanent importation. We'll replace words with synonyms to lessen the SEO impact. We then would submit this refined list for evaluation and acceptance by the community. Nothing gets moved to article space before that. Would that alleviate some of the concerns? [[User:Acer|Acer]] ([[User talk:Acer|talk]]) 10:23, 17 September 2016 (UTC)

+

+

:I don't think there's an issue with importing to your user space as an experiment. I do agree with a "quality" standard, though. Apart from the SEO concerns, there has also been ample discussion about the many very small outlines. Even with a minimal intro and a listing or two, such outlines are not considered a gain by all. While we don't delete the existing ones, it has always been discouraged to mass-create them. The same would be possible by importing from Wikipedia, for example. I do think creating hundreds of such small articles should be avoided in this case too, then. So I think focussing on the larger, higher quality articles is the best way to go. Nice to see that list, though :) [[User:JuliasTravels|JuliasTravels]] ([[User talk:JuliasTravels|talk]]) 11:29, 17 September 2016 (UTC)

+

+

:: Great, we're in agreement. Let's see if others are on board. Thanks! [[User:Acer|Acer]] ([[User talk:Acer|talk]]) 14:11, 17 September 2016 (UTC)

+

+

::: I have a number of thoughts about this effort but due to past history would prefer to stay out of this discussion. The only comment I would make is that if this effort does move forward there cannot be any failure whatsoever in ensuring that the imports comply 100% with every line of the CC-SA license - if there is ANY question about whether correct attribution has been provided then the imports should be deleted as quickly as possible. -- [[User:Wrh2|Ryan]] 22:22, 17 September 2016 (UTC)

+

+

:::: I followed the model used when the fork happened, attribution in the edit summary of each revision ''(Import from wikitravel.org/en)''. I also added a WT prefix to usernames to differentiate them from any possible duplicate here. This was also done back then. A copy of the license is linked to at the bottom of every page here already and any changes we make after importing will be recorded in the history. That I think covers all attribution requirements in the license. The ShareAlike requirements are fully covered also. Did I miss anything? [[User:Acer|Acer]] ([[User talk:Acer|talk]]) 22:58, 17 September 2016 (UTC)

+

:::::"(WT-en) " is the preferred username prefix; that matches what we used for migration. I guess my main concern is that doing this could lead to WT doing the reverse. Sure, we know they already have the legal right to do it, but why encourage it or tip them off? [[User:LtPowers|Powers]] <small><sup>([[User talk:LtPowers|talk]])</sup></small> 23:18, 17 September 2016 (UTC)

+

::::::The only reason it was sufficient to attribute imports to a "(WT-en)" user during the initial import is because there is a corresponding user page with proper attribution; if the new import is done without the corresponding user page (and its corresponding references) then the attribution is probably insufficient. -- [[User:Wrh2|Ryan]] • ([[User talk:Wrh2|talk]]) • 06:13, 18 September 2016 (UTC)

+

::::::: But we will be providing the name/ pseudonym of the authors and also say that they were active in another website. This actually goes beyond the requirements. I reread the license terms and I can't find any issues. [[User:Acer/License|I placed the sections relevant to attribution here]] and bolded the relevant parts. What part do you think we are failing to comply with? [[User:Acer|Acer]] ([[User talk:Acer|talk]]) 09:47, 18 September 2016 (UTC)

+

::::::::Creative Commons has, in the past at least, used the terminology "you must attribute the work in the manner specified by the author or licensor". I believe they've changed this terminology in the human-readable versions of their licenses, but I'm not 100% clear on the history. Anyway, when we migrated, out of an abundance of caution, we interpreted "the manner specified by the author" to include not just the username proper, but a link to the author's user page, as well as attribution in the page footer according to one's preferred display name (as opposed to username). Neither of these can happen without importation of the user pages and preferences as well. [[User:LtPowers|Powers]] <small><sup>([[User talk:LtPowers|talk]])</sup></small> 20:26, 19 September 2016 (UTC)

+

Let my register my '''oppose''' vote on any copying from WT. We might get inspiration what to do articles on, for sure, but they should be bottom up written by us with our own words (and better yet first hand experience) rather than copying a single comma from "that other site". Not only is there the concern with the fork/duplication penalty and the possibility that the three paid admins and five spambot IPs that still remain on that site might get similar ideas in reverse, I just don't think we need to do that. Our new content since the fork/migration is good. Sure, there might be a few nuggets of gold in what has happened over there since almost everybody left, but the main thing to copy are imho the ideas of what to write an article on and not an article itself. Besides, it would be interesting to see a split how many of those articles are destinations, how many are travel topics and so on. Probably a lot of them would just be redundant or have been axed with good reason over here. [[User:Hobbitschuster|Hobbitschuster]] ([[User talk:Hobbitschuster|talk]]) 01:52, 18 September 2016 (UTC)

+

::::::: I did a quick sampling of the list and thought about this a little more. I also '''oppose''' the idea of importing these articles. There is little point to having article with no listings and a just walking into controversy on copyrights. Maybe use this list to identify needed articles but then just add the location name to [[Wikivoyage:Requested articles‎]] and add a red link to the appropriate region. --[[User:Traveler100|Traveler100]] ([[User talk:Traveler100|talk]]) 07:30, 18 September 2016 (UTC)

+

:::::::: That's not happening. The plan is for nothing to be transferred to article space unless it's of sufficient quality and has been properly formatted. I'm just asking here to have these pages into userspace so I and others can work on them. See these pages here, they are much better than the average article we have.. (hit Random pages a few times) [http://wikitravel.org/en/Tychy Tychy] [http://wikitravel.org/en/Cirali Cirali] [http://wikitravel.org/en/Sinj Sinj] [http://wikitravel.org/en/Pian_Camuno Pian Camuno] [http://wikitravel.org/en/Lu%C3%A7on Luçon] [http://wikitravel.org/en/Dania_beach Dania beach] and they already include information for the listings, just to need to format using the template. There are mony others like this, but not that many, a few hundred maybe. Everything else will be discarded [[User:Acer|Acer]] ([[User talk:Acer|talk]]) 08:51, 18 September 2016 (UTC)

+

+

::::::::: I'm struggling to understand the utility of such an import. If I wanted to do import any WT (or other CC licensed content) then I would just create that article directly in WV or in my user space. What do you actually get out of importing all of these articles?

+

::::::::: If you know that [[Dania beach]] is a really awesome article on WT then just go ahead and build it here. No need to import it it 'to work on' first. [[User:Andrewssi2|Andrewssi2]] ([[User talk:Andrewssi2|talk]]) 20:46, 18 September 2016 (UTC)

+

:::::::::: Not sure I understood you. You mean copy/paste the text? If we do that then there would need to be an attribution template in the body of the article linking to WT. That's not ideal.. Importing the whole history is safer license wise. Also, I don't know which articles are good and which aren't. That's why I wanted to import into userspace and then do a triage. FInally, importing an XML dump is much simpler/faster than copying hundreds of pages by hand.[[User:Acer|Acer]] ([[User talk:Acer|talk]]) 23:01, 18 September 2016 (UTC)

+

:::::::::::I think what was meant was that you can see what is in the WT article, verify it from sources outside WT and upgrade / create our article accordingly. I have to reiterate here that I do not think copying anything from that other site under any circumstances is a wise move. In my opinion we have made a lot of headway by things including random drift at our articles (IP editors or simple wording changes) that google recognizes. Importing or copying from that other site would hurt that more than even the most diligent work over years could help us. I would like a list of genuine travel topics that site has and we don't. Destination articles either come about by people with local knowledge showing up or they don't. Forcing it is not gonna help us. You can get a city article to "usable" without having been there if it is in a country with reasonable "on the internet percentage" for businesses, but those articles do not do much good besides completing regions and whatnot and should not be created just 'cause. The best impetus for new articles is someone with local knowledge starting them. Even if that someone has a limited grasp of English or has touty intentions. So, let's look through the three or four travel topics worth salvaging, create them here from scratch and for the most part forget that other site even exists. I am actually not sure we should even import this stuff to anybody's user page without clear consensus in favor. [[User:Hobbitschuster|Hobbitschuster]] ([[User talk:Hobbitschuster|talk]]) 23:20, 18 September 2016 (UTC)

+

::::::::::::[[User:Wrh2|Ryan]], [[User:Hobbitschuster|Hobbitschuster]], and (particularly) [[User:LtPowers|Powers]]' arguments have convinced me we're playing with fire here. I think it might be useful to retain the list of articles present on WT and absent here, but beyond that I oppose in the strongest possible terms any notion of copying from the other site, and frankly (germane to Powers' comments) would love for this discussion to be brought to a speedy close, in case any prying eyes from over there are watching. -- [[User:AndreCarrotflower|AndreCarrotflower]] ([[User talk:AndreCarrotflower|talk]]) 01:29, 19 September 2016 (UTC)

+

+

::::::::::::: I was trying hard to not give the impression that we would forbid this process, but the benefits are minuscule and we really need to keep away from the IB company if we can possibly help it.

+

::::::::::::: Basically great intention, but there are safer and better ways to achieve new content. [[User:Andrewssi2|Andrewssi2]] ([[User talk:Andrewssi2|talk]]) 05:32, 19 September 2016 (UTC)

+

:::::::::::::: That's alright. I've been editing wikis for over a decade now. Sometimes you get your way, sometimes you don't. Archive away :) [[User:Acer|Acer]] ([[User talk:Acer|talk]]) 23:26, 19 September 2016 (UTC)

Show more