-
Notifications
You must be signed in to change notification settings - Fork 5
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Total rework of the fetching engine: + Instead of 8 threads in a pool processing the work sequentially, the whole Kanji bank is divided into 8 (nearly) equal parts, and each thread sequentially downloads each page in the chunks. + Instead of relying on web.archive.org as a proxy and fetch stuff as old as 2018, we now have access to the latest data from the original website using the TOR proxy. + New hvdic parsing logic. The messy code is replaced with an object oriented approach. This allows type-safe scraping of the dictionary, as well as serializing the whole hvdic as JSON or something else to be used in the future. + The old WebArchiveClient is still kept as a useful reference (Don't have the time and enthusiasm to make it a separate NuGet package yet). - Refreshed hvcache with the new pages obtained by this method. - A new out_vn folder is built.
- Loading branch information
1 parent
8767d50
commit b6f3db3
Showing
9,770 changed files
with
545,686 additions
and
596,060 deletions.
The diff you're trying to view is too large. We only load the first 3000 changed files.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.