-
-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Contribute Morphuntion from Apple as open source software #35
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First round. I'll go into actual code in the next.
@@ -0,0 +1,118 @@ | |||
# | |||
# Copyright 2018-2024 Apple Inc. All rights reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we keep the Apple copyright message? Mostly a question for Anne and George.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As per the email discussion, the previously included license as been removed in order to use the existing license in this repository, but these copyright notices should not be disturbed.
morphuntion/CMakeLists.txt
Outdated
add_link_options(-g -fprofile-instr-generate -fcoverage-mapping) | ||
endif() | ||
|
||
#Set these warning properties on a project level |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Some comments start with capital letters, some don't. Some have space between # and text and some don't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have changed this file to make the comments more consistent in style.
|
||
plugins { | ||
id 'net.ltgt.errorprone' version '1.1.1' apply false | ||
id 'maven-publish' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dumb question - what do we use Java for in this project?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's dictionary-parser. Denny will likely want to look at that code. It can be reimplemented in another programming language when using wikidata as a data source, but I wanted to provide the tool source code so that the logic behind generating the lexical dictionaries is a lot clearer. The tool will likely be deleted once wikidata can be ingested.
This also had a Java wrapper through JNI. That implementation was excluded from this contribution at this time to expedite making this implementation open source.
# | ||
# Copyright 2022-2024 Apple Inc. All rights reserved. | ||
# | ||
yue_HK=yue_Hant |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this just mapping one form of LangID to another? Why (comment)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's an aliasing scheme. In this case, we're making the script explicit for a given region.
tokenizer.normalize=\u062A | ||
tokenizer.replace=\u0629 | ||
|
||
#tokenizer.decompound=(\u0648|\u0641)?(\u0628|\u0643|\u0644)?(\u0627\u0644)?(.+?)(\u064A|\u0646\u064A|\u0643|\u0646\u0627|\u0643\u0645|\u0643\u0646|\u0647|\u0647\u0627|\u0643\u0646|\u0647\u0645|\u0647\u0646)? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has been removed. It was informative about the previous implementation, but it's not helpful anymore.
/* | ||
* Copyright 2016-2024 Apple Inc. All rights reserved. | ||
*/ | ||
// This is a generated file. Do not edit this file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we care how it's generated? If yes, some epointers to the generator would be good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is generated with the generateICUWrapper tool. It takes an existing ICU installation, and generates C++ wrappers around the C API. This provides a natural adaptor for C++ API, but with the stability of the C ABI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The file can be regenerated when ICU adds new C API.
BreakIterator(UBreakIteratorType type, const char * locale, std::u16string_view text) { | ||
UErrorCode ec = U_ZERO_ERROR; | ||
wrappee_ = ubrk_open(type, locale, (const UChar *)text.data(), (int32_t)text.size(), &ec); | ||
::morphuntion::exception::ICUException::throwOnFailure(ec); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some companies don't allow exceptions in code. Do these bubble up to top or are they always caught and then API returns an error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They're not caught on the C++ API. The C API wrappers around the C++ code converts the exceptions to CFErrorRef. If you don't want C++ exceptions, use the C API. If you want C++ exceptions, use the C++ API. C++ library code also generates exceptions. So you can't completely omit them from C++.
/* | ||
* Copyright 2016-2024 Apple Inc. All rights reserved. | ||
*/ | ||
#pragma once |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hopefully this works everywhere we care about (we use guards instead). https://en.wikipedia.org/wiki/Pragma_once
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That page doesn't show any compilers that don't support it. I guess the only issue would be if you want to use an old compiler. From a maintenance point of view, this is much more reliable than using macros.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GitHub makes reviewing 800+ files PR a nightmare (everything is super laggy and slow). Let's land this and then go over important details in follow up PRs.
I guess current legal discussion is preventing the merge.
@@ -0,0 +1,551 @@ | |||
// | |||
// main.cpp | |||
// icu4cxx_generate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if, long term, we could convert this into Python or some kind of a template language e.g. https://github.com/kainjow/Mustache
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, rewriting this in another programming language is fine. It’s rarely run anyway.
This contribution should resolve the following issues: #5, #6, #7, #11, #12, #13, #15, #17, #18, #19 This contribution is also related to the following issues without fully resolving the issues: 3, 4, 8, 10, 21, 23, 24, 25 This contribution also has an implementation that addresses these CLDR issues: 13025, 13563
This contribution is related to the Unicode Technical Workshop topic about Inflection.
This contribution should resolve the following issues: #5, #6, #7, #11, #12, #13, #15, #17, #18, #19
This contribution is also related to the following issues without fully resolving the issues: 3, 4, 8, 10, 21, 23, 24, 25
This contribution also has an implementation that addresses these CLDR issues: 13025, 13563