Skip to content

Commit

Permalink
Cap the estimated ratios to interval 0..1
Browse files Browse the repository at this point in the history
  • Loading branch information
janmotl committed Oct 13, 2020
1 parent 8ae93db commit ef438d5
Show file tree
Hide file tree
Showing 5 changed files with 19 additions and 5 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ If the schema quality is extremely low (e.g. all columns are typed as text and h
## Known issues
If you are using MySQL and get `Access to data dictionary table 'mysql.table_stats' is rejected` then it is because MySQL, contrary to MariaDB, prevents access to internal tables. To be able to run Linkifier, start the db in the [debug mode](http://datacharmer.blogspot.com/2016/09/showing-hidden-tables-in-mysql-8-data.html).

If you have problems to connect to MSSQL, make sure login with [username-password](https://serverfault.com/questions/246951/set-a-login-with-username-password-for-sql-server-2008-express) combination is permitted and [TCP/IP](https://stackoverflow.com/questions/18841744/jdbc-connection-failed-error-tcp-ip-connection-to-host-failed) protocol is permitted. If you want to use Windows Authentication instead of username-password combination, a correct version of `sqljdbc_auth.dll` (32/64bit) must be in the Java path. Possibly the easisest solution is to copy `sqljdbc_auth.dll` from source directory `lib\mssql\auth\` into bin directory in Java (e.g. `C:\Program Files\Java\jre8\bin\`). Alternatively, start Linkifier from the command line with arguments like `java -Djava.library.path=c:\path\to\sqljdbc_auth_directory -jar linkifier-3.2.6.jar`.
If you have problems to connect to MSSQL, make sure login with [username-password](https://serverfault.com/questions/246951/set-a-login-with-username-password-for-sql-server-2008-express) combination is permitted and [TCP/IP](https://stackoverflow.com/questions/18841744/jdbc-connection-failed-error-tcp-ip-connection-to-host-failed) protocol is permitted. If you want to use Windows Authentication instead of username-password combination, a correct version of `sqljdbc_auth.dll` (32/64bit) must be in the Java path. Possibly the easisest solution is to copy `sqljdbc_auth.dll` from source directory `lib\mssql\auth\` into bin directory in Java (e.g. `C:\Program Files\Java\jre8\bin\`). Alternatively, start Linkifier from the command line with arguments like `java -Djava.library.path=C:\path\to\sqljdbc_auth_directory -jar linkifier.jar`.

If you have any question or suggestion, let me know.

Expand Down
2 changes: 1 addition & 1 deletion build.gradle
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
apply plugin: 'application'

group 'linkifier'
version '3.2.6'
version '3.2.7'
sourceCompatibility = 1.8
mainClassName = 'controller.MainApp'

Expand Down
12 changes: 10 additions & 2 deletions src/vendor/MSSQL.java
Original file line number Diff line number Diff line change
Expand Up @@ -207,8 +207,16 @@ public void getColumnStatistics(String databaseName, String schemaName, List<Tab
}
column.setTextMin(rs.getString(3));
column.setTextMax(rs.getString(4));
column.setNullRatio((column.getRowCount()==null || column.getRowCount()==0) ? null : rs.getDouble(5) / column.getRowCount());
column.setUniqueRatio(column.getRowCount()==null || column.getRowCount()==0 ? null : (1/rs.getDouble(6)) / column.getRowCount());
// We are working with estimates.
// And MSSQL sometimes returns the estimated count of nulls > estimated count of rows.
// This causes troubles in the estimate of WidthAvgWithoutNulls -> we cap the estimate.
double nullRatio = (column.getRowCount()==null || column.getRowCount()==0) ? null : rs.getDouble(5) / column.getRowCount();
nullRatio = Math.min(Math.max(nullRatio, 0.0), 1.0);
column.setNullRatio(nullRatio);
// We preventively cap the unique ratio as well.
double uniqueRatio = column.getRowCount()==null || column.getRowCount()==0 ? null : (1/rs.getDouble(6)) / column.getRowCount();
uniqueRatio = Math.min(Math.max(uniqueRatio, 0.0), 1.0);
column.setUniqueRatio(uniqueRatio);
// Azure counts nulls in widthAvg. But for FK-PK match detection it is better to exclude nulls from widthAvg
// as PK should not contain nulls but FK may contain nulls.
// WidthAvgWithoutNulls = widthAvg/(1-nullRatio)
Expand Down
3 changes: 3 additions & 0 deletions src/vendor/QualityControl.java
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,9 @@ public static void qcNumericalValues(List<Table> tables) {
if (column.getUniqueRatio() != null && column.getUniqueRatio() < 0) LOGGER.severe("UniqueRatio " + table.getName() + "." + column.getName() + " is negative");
if (column.getWidthAvg() != null && column.getWidthAvg() < 0) LOGGER.severe("WidthAvg " + table.getName() + "." + column.getName() + " is negative");

if (column.getNullRatio() != null && column.getNullRatio() > 1) LOGGER.severe("NullRatio " + table.getName() + "." + column.getName() + " is bigger than 1");
if (column.getUniqueRatio() != null && column.getUniqueRatio() > 1) LOGGER.severe("UniqueRatio " + table.getName() + "." + column.getName() + " is bigger than 1");

// Integers
if (column.getRowCount() != null && column.getRowCount() < 0) LOGGER.severe("RowCount " + table.getName() + "." + column.getName() + " is negative");
}
Expand Down
5 changes: 4 additions & 1 deletion test/main/VendorTest.java
Original file line number Diff line number Diff line change
Expand Up @@ -194,7 +194,10 @@ public void PostgreSQL_number() throws Exception {
vendor.getColumnStatistics("PredictorFactory", "financial", tables, connection);
}

// There are 760931 null records
// There are 760931 null records in "trans"."account" column.
// Note that by default the database stores sample statistics -> we may have to rerun:
// ANALYZE financial.trans
// in the database.
assertEquals(760931.0/1056320.0, tables.get(0).getColumn("account").getNullRatio(), 0.01);
// There are 7665 non-null unique records from all records
assertEquals(7665.0/1056320.0, tables.get(0).getColumn("account").getUniqueRatio(), 0.01);
Expand Down

0 comments on commit ef438d5

Please sign in to comment.