Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AvroSerdeException when importing data to avro table #28

Open
ramasLTU opened this issue Jun 5, 2013 · 1 comment
Open

AvroSerdeException when importing data to avro table #28

ramasLTU opened this issue Jun 5, 2013 · 1 comment

Comments

@ramasLTU
Copy link

ramasLTU commented Jun 5, 2013

Hi, I have problems with a simple task:

  1. create hive table (stored as textfile compressed with bz2)
  2. import that table to partitioned (and compressed) avro table

Here is a short tale of me hitting the wall. Maybe you can identify which turn do I miss..

  1. create text table:

CREATE EXTERNAL TABLE sample(
number int,
text string
) ROW FORMAT
DELIMITED FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\n'
STORED AS TEXTFILE
LOCATION '/user/root/sample/'

  1. create some text file with couple rows in a fashion:
    1 row1
    2 row2
    3 row3
  2. compress that file with bz2, upload and check that table returns when selected: SELECT * FROM sample; Works like charm for me.
  3. create a partitioned, avro table:

CREATE TABLE sample_avro
PARTITIONED BY (number int)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
STORED as INPUTFORMAT
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
TBLPROPERTIES (
'avro.schema.literal'='{
"namespace": "my.sample",
"name": "sample_avro",
"type": "record",
"fields": [ { "name":"text","type":"string"}]
}')

  1. import data to table:
    SET hive.exec.dynamic.partition.mode=nonstrict;
    SET hive.exec.compress.output=true;
    INSERT INTO TABLE sample_avro partition (number)
    SELECT text, number FROM sample;

This is the moment when bad things happen... In the log i can see:

13/06/04 09:52:28 INFO exec.MoveTask: Partition is: {number=null}
13/06/04 09:52:28 WARN avro.AvroSerdeUtils: Encountered AvroSerdeException determining schema. Returning signal schema to indicate problem
org.apache.hadoop.hive.serde2.avro.AvroSerdeException: Neither avro.schema.literal nor avro.schema.url specified, can't determine table schema
at org.apache.hadoop.hive.serde2.avro.AvroSerdeUtils.determineSchemaOrThrowException(AvroSerdeUtils.java:66)
at org.apache.hadoop.hive.serde2.avro.AvroSerdeUtils.determineSchemaOrReturnErrorSchema(AvroSerdeUtils.java:87)
at org.apache.hadoop.hive.serde2.avro.AvroSerDe.initialize(AvroSerDe.java:59)
at org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:249)
at org.apache.hadoop.hive.ql.metadata.Partition.getDeserializer(Partition.java:251)
at org.apache.hadoop.hive.ql.metadata.Partition.initialize(Partition.java:217)
at org.apache.hadoop.hive.ql.metadata.Partition.(Partition.java:107)
at org.apache.hadoop.hive.ql.metadata.Hive.getPartition(Hive.java:1500)
at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1195)
at org.apache.hadoop.hive.ql.metadata.Hive.loadDynamicPartitions(Hive.java:1271)
at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:259)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:138)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1374)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1160)
at com.cloudera.beeswax.BeeswaxServiceImpl$RunningQueryState.execute(BeeswaxServiceImpl.java:344)
at com.cloudera.beeswax.BeeswaxServiceImpl$RunningQueryState$1$1.run(BeeswaxServiceImpl.java:609)
at com.cloudera.beeswax.BeeswaxServiceImpl$RunningQueryState$1$1.run(BeeswaxServiceImpl.java:598)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:337)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1388)
at com.cloudera.beeswax.BeeswaxServiceImpl$RunningQueryState$1.run(BeeswaxServiceImpl.java:598)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

When selecting from sample_avro i get similar exception..

@wagnermarkd
Copy link
Collaborator

Hi,

I assume you're running Hive 0.11: Unfortunately there's a bug in 0.11 that
breaks partitioned Avro tables. The table properties for the schema don't
get passed to the SerDe properly, leading to the exception about missing
avro.schema.*. There's a JIRA open to track the issue here:
https://issues.apache.org/jira/browse/HIVE-3953. This will be fixed in the
next release.

Thanks,
Mark

On Wed, Jun 5, 2013 at 1:41 AM, ramasLTU [email protected] wrote:

Hi, I have problems with a simple task:

  1. create hive table (stored as textfile compressed with bz2)
  2. import that table to partitioned (and compressed) avro table

Here is a short tale of me hitting the wall. Maybe you can identify which
turn do I miss..

  1. create text table:

CREATE EXTERNAL TABLE sample(
number int,
text string
) ROW FORMAT
DELIMITED FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\n'
STORED AS TEXTFILE
LOCATION '/user/root/sample/'

  1. create some text file with couple rows in a fashion: 1 row1 2 row2
    3 row3
  2. compress that file with bz2, upload and check that table returns
    when selected: SELECT * FROM sample; Works like charm for me.
  3. create a partitioned, avro table:

CREATE TABLE sample_avro
PARTITIONED BY (number int)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
STORED as INPUTFORMAT
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
TBLPROPERTIES (
'avro.schema.literal'='{
"namespace": "my.sample",
"name": "sample_avro",
"type": "record",
"fields": [ { "name":"text","type":"string"}]
}')

  1. import data to table: SET
    hive.exec.dynamic.partition.mode=nonstrict; SET
    hive.exec.compress.output=true; INSERT INTO TABLE sample_avro partition
    (number) SELECT text, number FROM sample;

This is the moment when bad things happen... In the log i can see:

13/06/04 09:52:28 INFO exec.MoveTask: Partition is: {number=null}
13/06/04 09:52:28 WARN avro.AvroSerdeUtils: Encountered AvroSerdeException
determining schema. Returning signal schema to indicate problem
org.apache.hadoop.hive.serde2.avro.AvroSerdeException: Neither
avro.schema.literal nor avro.schema.url specified, can't determine table
schema
at
org.apache.hadoop.hive.serde2.avro.AvroSerdeUtils.determineSchemaOrThrowException(AvroSerdeUtils.java:66)
at
org.apache.hadoop.hive.serde2.avro.AvroSerdeUtils.determineSchemaOrReturnErrorSchema(AvroSerdeUtils.java:87)
at
org.apache.hadoop.hive.serde2.avro.AvroSerDe.initialize(AvroSerDe.java:59)
at
org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:249)
at
org.apache.hadoop.hive.ql.metadata.Partition.getDeserializer(Partition.java:251)
at
org.apache.hadoop.hive.ql.metadata.Partition.initialize(Partition.java:217)
at org.apache.hadoop.hive.ql.metadata.Partition.(Partition.java:107)
at org.apache.hadoop.hive.ql.metadata.Hive.getPartition(Hive.java:1500)
at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1195)
at
org.apache.hadoop.hive.ql.metadata.Hive.loadDynamicPartitions(Hive.java:1271)
at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:259)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:138)
at
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1374)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1160)
at
com.cloudera.beeswax.BeeswaxServiceImpl$RunningQueryState.execute(BeeswaxServiceImpl.java:344)
at
com.cloudera.beeswax.BeeswaxServiceImpl$RunningQueryState$1$1.run(BeeswaxServiceImpl.java:609)
at
com.cloudera.beeswax.BeeswaxServiceImpl$RunningQueryState$1$1.run(BeeswaxServiceImpl.java:598)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:337)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1388)
at
com.cloudera.beeswax.BeeswaxServiceImpl$RunningQueryState$1.run(BeeswaxServiceImpl.java:598)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

When selecting from sample_avro i get similar exception..


Reply to this email directly or view it on GitHubhttps://github.com//issues/28
.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants