44.71 GB. This is the size of the source code files allegedly stolen from Yandex, the most used search engine in Russia. The archive containing the data was posted on a forum popular with cybercriminals on January 25. Borderline2023, the user behind the post, claims the documents were uploaded by him in July 2022.
you code”recent“
All of the source code developed by the company would be present in these files. The archive would contain “certainly recent source code“, intended for “business service”, according to an analysis by software engineer Arseni Chestakov. Interactive map, messaging, online storage, taxi service (Uber type) … The source code of at least 13 services belonging to Yandex would have leaked, according to the analyst, who was able to confirm the information with employees of the company.
More interestingly, the files would largely reveal Yandex’s search algorithm and website ranking criteria. A real gold mine for SEO specialists, who have been trying for many years to decipher micro-signals to best optimize the ranking of sites within search pages.
Sometimes original criteria
Alex Buraks, a specialist in SEO (search engine optimization, or optimization for search engines in French), began to analyze part of the source code of the Yandex algorithm. He published his first conclusions in a detailed thread on Twitter. Although the approach of the Russian search engine is different from that used by its main competitors, the ranking of sites could prove to be close to that practiced by Google’s algorithms, according to experts in the field.
Specifically, Yandex would favor recent pages (fresh URLs) in its results, and those with a lot of organic traffic (unique visitors). At the same time, URLs with numbers or many slashes “https://www.lesnumeriques.com/” (slash) would be disadvantaged. Sites based on reliable pages, with few errors (such as a deleted or lost page, in code 404) would also be preferred. The age of the web page and its last update could also influence rankings, as could, to a lesser extent, the speed at which users click on a link or the time spent on a site. More surprisingly, Wikipedia pages would be better referenced by the algorithm.
For the most curious, the complete list of the 1922 search engine relevance criteria has been published in full on the Webmarketing School website. A highly damaging information leak for Yandex, which confirms the main factors sensed in recent years by SEO experts.