Skip to content

Commit

Permalink
feat: Huge rework
Browse files Browse the repository at this point in the history
- Total rework of the fetching engine:
  + Instead of 8 threads in a pool processing the work sequentially,
    the whole Kanji bank is divided into 8 (nearly) equal parts, and
    each thread sequentially downloads each page in the chunks.
  + Instead of relying on web.archive.org as a proxy and fetch stuff
    as old as 2018, we now have access to the latest data from the
    original website using the TOR proxy.
  + New hvdic parsing logic. The messy code is replaced with an object
    oriented approach. This allows type-safe scraping of the dictionary,
    as well as serializing the whole hvdic as JSON or something else
    to be used in the future.
  + The old WebArchiveClient is still kept as a useful reference (Don't
    have the time and enthusiasm to make it a separate NuGet package
    yet).
- Refreshed hvcache with the new pages obtained by this method.
- A new out_vn folder is built.
  • Loading branch information
trungnt2910 committed Jan 10, 2023
1 parent 8767d50 commit b6f3db3
Show file tree
Hide file tree
Showing 9,770 changed files with 545,686 additions and 596,060 deletions.
The diff you're trying to view is too large. We only load the first 3000 changed files.
79 changes: 41 additions & 38 deletions hvcache/%E3%90%86
Original file line number Diff line number Diff line change
Expand Up @@ -25,15 +25,14 @@

<title>Tra từ: 㐆 - Từ điển Hán Nôm</title>
<link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/5.11.2/css/all.min.css" type="text/css" rel="stylesheet">
<link href="/style.1590385402.css" type="text/css" rel="stylesheet">
<link href="/style.1649305351.css" type="text/css" rel="stylesheet">

<script type="text/javascript">
var _UseMobileView = false,
_UrlRewrite = true,
const _UrlRewrite = true,
_UrlBase = "/";
</script>
<script type="text/javascript" src="/libjs/jquery-2.2.4.min.1586829527.js"></script>
<script type="text/javascript" src="/scripts.1613359947.js"></script>
<script type="text/javascript" src="/scripts.1642564303.js"></script>

<script type="text/javascript">
window.onscroll = LP_OnScroll;
Expand All @@ -47,13 +46,25 @@ $(LP_OnStartup);
ga('create', 'UA-31319182-2', 'thivien.net');
ga('require', 'displayfeatures');
ga('send', 'pageview');
</script><script async="" src="//pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script><script>
</script><script async="" src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js?client=ca-pub-5466764586357052" crossorigin="anonymous"></script><script>
(adsbygoogle = window.adsbygoogle || []).push({
google_ad_client: "ca-pub-5466764586357052",
enable_page_level_ads: true
});
</script><script async="" src="https://securepubads.g.doubleclick.net/tag/js/gpt.js"></script>
<script>
window.googletag = window.googletag || {cmd: []};
googletag.cmd.push(function() {
googletag.defineSlot("/27973503/Publishers/Desktop/Large1/thivien.net", [[300, 600]], "div-gpt-ad-1632973737400-0").addService(googletag.pubads());
googletag.pubads().enableSingleRequest();
googletag.enableServices();
});
</script></head>
<body style="background-image:url('/images/background.jpg')"><div id="fb-root"></div>
<body>

<script>
if (UsingDarkMode()) ApplyDarkMode(true);
</script><div id="fb-root"></div>
<script>(function(d, s, id) {
var js, fjs = d.getElementsByTagName(s)[0];
if (d.getElementById(id)) return;
Expand All @@ -68,38 +79,35 @@ $(LP_OnStartup);
<a class="menu-icon" href="javascript:void(0)"></a>

<ul class="menu"><li class="menu-item current"><a href="/" class="menu-title">Tra tổng hợp</a></li><li class="menu-item menu-group"><span class="menu-title">Tìm chữ</span>
<ul class="submenu"><li class="menu-item"><a href="/rad-hv" class="menu-title">Theo bộ thủ</a></li><li class="menu-item"><a href="/strokes-hv" class="menu-title">Theo nét viết</a></li><li class="menu-item"><a href="/ccd-hv" class="menu-title">Theo hình thái</a></li><li class="menu-divider"></li><li class="menu-item"><a href="/lookup-help.php" class="menu-title">Hướng dẫn</a></li><li class="menu-item"><a href="/pop-hv" class="menu-title">Chữ thông dụng</a></li></ul></li><li class="menu-item menu-group"><span class="menu-title">Chuyển đổi</span>
<ul class="submenu"><li class="menu-item"><a href="/transcript.php#trans" class="menu-title">Chữ Hán phiên âm</a></li><li class="menu-item"><a href="/transcript.php#han" class="menu-title">Phiên âm chữ Hán</a></li><li class="menu-item"><a href="/transcript.php#t2s" class="menu-title">Phồn thể giản thể</a></li><li class="menu-item"><a href="/transcript.php#s2t" class="menu-title">Giản thể phồn thể</a></li></ul></li><li class="menu-item menu-group"><span class="menu-title">Công cụ</span>
<ul class="submenu"><li class="menu-item"><a href="/rad-hv" class="menu-title">Theo bộ thủ</a></li><li class="menu-item"><a href="/strokes-hv" class="menu-title">Theo nét viết</a></li><li class="menu-item"><a href="/ccd-hv" class="menu-title">Theo hình thái</a></li><li class="menu-divider"></li><li class="menu-item"><a href="/reading/1/1" class="menu-title">Theo âm Nhật (onyomi)</a></li><li class="menu-item"><a href="/reading/1/2" class="menu-title">Theo âm Nhật (kunyomi)</a></li><li class="menu-item"><a href="/reading/1/3" class="menu-title">Theo âm Hàn</a></li><li class="menu-item"><a href="/reading/1/4" class="menu-title">Theo âm Quảng Đông</a></li><li class="menu-divider"></li><li class="menu-item"><a href="/lookup-help.php" class="menu-title">Hướng dẫn</a></li><li class="menu-item"><a href="/pop-hv" class="menu-title">Chữ thông dụng</a></li></ul></li><li class="menu-item menu-group"><span class="menu-title">Chuyển đổi</span>
<ul class="submenu"><li class="menu-item"><a href="/transcript.php#trans" class="menu-title">Chữ Hán <i class="fas fa-arrow-right gray"></i> phiên âm</a></li><li class="menu-item"><a href="/transcript.php#han" class="menu-title">Phiên âm <i class="fas fa-arrow-right gray"></i> chữ Hán</a></li><li class="menu-item"><a href="/transcript.php#t2s" class="menu-title">Phồn thể <i class="fas fa-arrow-right gray"></i> giản thể</a></li><li class="menu-item"><a href="/transcript.php#s2t" class="menu-title">Giản thể <i class="fas fa-arrow-right gray"></i> phồn thể</a></li></ul></li><li class="menu-item menu-group"><span class="menu-title">Công cụ</span>
<ul class="submenu"><li class="menu-item"><a href="/applications.php" class="menu-title">Cài đặt ứng dụng</a></li><li class="menu-item"><a href="/writing.php" class="menu-title">Học viết chữ Hán</a></li><li class="menu-item"><a href="/fonts.php" class="menu-title">Font chữ Hán Nôm</a></li></ul></li><li class="menu-item"><a href="/comment.php" class="menu-title">Góp ý</a></li> </ul>
</nav>
</div>
</header>

<section><div style="width:100%; overflow-x:hidden">
<div class="fb-like" data-href="/" data-layout="standard" data-action="like" data-show-faces="false" data-share="true"></div>
</div><div style="text-align:center"><div style="margin:10px 0">
<section><div class="fb-plugin-container" style="width:100%; overflow-x:hidden" data-type="like" data-href="/"></div><div style="text-align:center"><div style="margin:10px 0">
<!-- HVDic-Responsive -->
<ins class="adsbygoogle" style="display:block" data-ad-client="ca-pub-5466764586357052" data-ad-slot="6998088321" data-ad-format="horizontal"></ins>
<ins class="adsbygoogle" style="display:block" data-ad-client="ca-pub-5466764586357052" data-ad-slot="6998088321" data-ad-format="horizontal" data-full-width-responsive="true"></ins>
<script>
(adsbygoogle = window.adsbygoogle || []).push({});
</script>
</div></div><div>
<div class="right-col"><!-- HVDic-BigCol -->
</div></div><div class="main-content">
<div class="ads-col"><!-- HVDic-BigCol -->
<ins class="adsbygoogle" style="display:inline-block;width:300px;height:600px" data-ad-client="ca-pub-5466764586357052" data-ad-slot="1091155526"></ins>
<script>
(adsbygoogle = window.adsbygoogle || []).push({});
</script>

<p></p>

<div>
<div id="FloatingAds" style="margin:auto">
<script type="text/javascript" src="https://e.eclick.vn/delivery/zone/2272.js"></script>
<div class="sticky-top" style="margin-top: 5px">
<div id="div-gpt-ad-1632973737400-0" style="min-width: 300px; min-height: 250px;">
<script>googletag.cmd.push(function() { googletag.display("div-gpt-ad-1632973737400-0"); });</script>
</div>
</div></div>
<div class="main-content">
<div class="content-col">
<div class="sticky-top">
<form id="LookupForm" class="main-form" method="GET" action="/" data-lang="1" onsubmit="OnLookupFormSubmitted(event)">
<input type="text" readonly="" class="main-input with-vbtn" name="Value" autofocus="" placeholder="車 车 xa xe che1 chē" accesskey="1" value="㐆">
<input type="text" readonly="" class="main-input" name="Value" autofocus="" placeholder="車 车 xa xe che1 chē" accesskey="1" value="㐆">

<div class="vbtn-group">
<button type="button" disabled="" class="mode-selector">文</button><!--
Expand All @@ -117,26 +125,21 @@ $(LP_OnStartup);

</form>

<div class="alert">Chưa có giải nghĩa theo âm Hán Việt, bạn có thể tìm thêm thông tin bằng cách:<ul><li><a href="/wpy/㐆">tra theo âm Pinyin</a></li><li>tham khảo các chữ dị thể ở dưới</li></ul></div><div class="hvres"><div style="margin:10px 0">
<!-- HVDic-Responsive -->
<ins class="adsbygoogle" style="display:block" data-ad-client="ca-pub-5466764586357052" data-ad-slot="6998088321" data-ad-format="auto"></ins>
<script>
(adsbygoogle = window.adsbygoogle || []).push({});
</script>
</div></div><div class="hvres han-word">
<div class="alert">Chưa có giải nghĩa theo âm Hán Việt, bạn có thể tìm thêm thông tin bằng cách:<ul><li><a href="/wpy/㐆">tra theo âm Pinyin</a></li><li>tham khảo các chữ dị thể ở dưới</li></ul></div><div class="hvres han-word">
<div class="hvres-header"> <div class="hvres-word han"><a href="/whv/㐆">㐆</a></div> <div class="hvres-definition single"> </div>
</div>

<div class="hvres-details"><>
<div class="hvres-meaning">Tổng nét: 6<br>Bộ: <a href="/rad-hv/丿">triệt 丿</a> (+5 nét)<br>Nét bút: <a href="/strokes-hv?Strokes=ノノフ一一フ">ノノフ一一フ</a><br>Thương Hiệt: HSMS (竹尸一尸)<br>Unicode: <a href="https://www.unicode.org/cgi-bin/GetUnihanData.pl?codepoint=3406" target="_blank">U+3406</a><br>Độ thông dụng trong Hán ngữ cổ: rất thấp<br>Độ thông dụng trong tiếng Trung hiện đại: rất thấp</div><p class="hvres-source">Âm đọc khác</p>
<div class="hvres-meaning">Âm Pinyin: <a href="/py/yin3">yǐn</a><br>Âm Quảng Đông: <a href="/reading/1/4/zaan2">zaan2</a></div><p class="hvres-source">Dị thể <span class="badge">1</span></p>
<div class="hvres-meaning">Âm Pinyin: <a href="/py/yin3">yǐn <span class="small gray han">ㄧㄣˇ</span></a><br>Âm Quảng Đông: <a href="/reading/1/4/zaan2">zaan2</a></div><p class="hvres-source">Dị thể <span class="badge">1</span></p>
<div class="hvres-meaning"><a href="/whv/𠂣"><span class="hvres-variant han" data-tippy-placement="bottom" data-tippy-content="Cách viết khác">&#131235;</span></a><p class="gray small" style="padding-left:10px"><a href="/fonts.php">Không hiện chữ?</a></p></div> </div>
</div></div>
</div></section>

<div id="footer" style="clear:both">
&nbsp;
</div>

<script type="text/javascript" src="/libjs/avim20080728.min.js"></script></body>
</html>
</div> </div>
</div>
</div> <footer class="small">
<span>&copy; 2001-2023</span>
<div id="dark-mode-selector" class="dropdown">
<span class="dropdown-btn"><i class="fas fa-fw fa-adjust"></i> Màu giao diện <i class="fa fa-caret-down"></i></span>
<div class="dropdown-content align-right z-high"><span class="dropdown-item" data-value="light"><i class="fas fa-fw fa-sun"></i> Luôn sáng</span><span class="dropdown-item" data-value="dark"><i class="fas fa-fw fa-moon"></i> Luôn tối</span><span class="dropdown-item active" data-value="by-system"><i class="fas fa-fw fa-window-maximize"></i> Tự động: theo trình duyệt</span><span class="dropdown-item" data-value="by-time"><i class="fas fa-fw fa-clock"></i> Tự động: theo thời gian ngày/đêm</span></div>
</div>
</footer> </section><script type="text/javascript" src="/libjs/avim20080728.min.js"></script></body>
</html>
Loading

0 comments on commit b6f3db3

Please sign in to comment.