AvroSerdeException when importing data to avro table #28

ramasLTU · 2013-06-05T08:41:09Z

Hi, I have problems with a simple task:

create hive table (stored as textfile compressed with bz2)
import that table to partitioned (and compressed) avro table

Here is a short tale of me hitting the wall. Maybe you can identify which turn do I miss..

create text table:

CREATE EXTERNAL TABLE sample(
number int,
text string
) ROW FORMAT
DELIMITED FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\n'
STORED AS TEXTFILE
LOCATION '/user/root/sample/'

create some text file with couple rows in a fashion:
1 row1
2 row2
3 row3
compress that file with bz2, upload and check that table returns when selected: SELECT * FROM sample; Works like charm for me.
create a partitioned, avro table:

CREATE TABLE sample_avro
PARTITIONED BY (number int)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
STORED as INPUTFORMAT
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
TBLPROPERTIES (
'avro.schema.literal'='{
"namespace": "my.sample",
"name": "sample_avro",
"type": "record",
"fields": [ { "name":"text","type":"string"}]
}')

import data to table:
SET hive.exec.dynamic.partition.mode=nonstrict;
SET hive.exec.compress.output=true;
INSERT INTO TABLE sample_avro partition (number)
SELECT text, number FROM sample;

This is the moment when bad things happen... In the log i can see:

13/06/04 09:52:28 INFO exec.MoveTask: Partition is: {number=null}
13/06/04 09:52:28 WARN avro.AvroSerdeUtils: Encountered AvroSerdeException determining schema. Returning signal schema to indicate problem
org.apache.hadoop.hive.serde2.avro.AvroSerdeException: Neither avro.schema.literal nor avro.schema.url specified, can't determine table schema
at org.apache.hadoop.hive.serde2.avro.AvroSerdeUtils.determineSchemaOrThrowException(AvroSerdeUtils.java:66)
at org.apache.hadoop.hive.serde2.avro.AvroSerdeUtils.determineSchemaOrReturnErrorSchema(AvroSerdeUtils.java:87)
at org.apache.hadoop.hive.serde2.avro.AvroSerDe.initialize(AvroSerDe.java:59)
at org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:249)
at org.apache.hadoop.hive.ql.metadata.Partition.getDeserializer(Partition.java:251)
at org.apache.hadoop.hive.ql.metadata.Partition.initialize(Partition.java:217)
at org.apache.hadoop.hive.ql.metadata.Partition.(Partition.java:107)
at org.apache.hadoop.hive.ql.metadata.Hive.getPartition(Hive.java:1500)
at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1195)
at org.apache.hadoop.hive.ql.metadata.Hive.loadDynamicPartitions(Hive.java:1271)
at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:259)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:138)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1374)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1160)
at com.cloudera.beeswax.BeeswaxServiceImpl$RunningQueryState.execute(BeeswaxServiceImpl.java:344)
at com.cloudera.beeswax.BeeswaxServiceImpl$RunningQueryState$1$1.run(BeeswaxServiceImpl.java:609)
at com.cloudera.beeswax.BeeswaxServiceImpl$RunningQueryState$1$1.run(BeeswaxServiceImpl.java:598)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:337)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1388)
at com.cloudera.beeswax.BeeswaxServiceImpl$RunningQueryState$1.run(BeeswaxServiceImpl.java:598)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

When selecting from sample_avro i get similar exception..

wagnermarkd · 2013-06-05T18:09:53Z

Hi,

I assume you're running Hive 0.11: Unfortunately there's a bug in 0.11 that
breaks partitioned Avro tables. The table properties for the schema don't
get passed to the SerDe properly, leading to the exception about missing
avro.schema.*. There's a JIRA open to track the issue here:
https://issues.apache.org/jira/browse/HIVE-3953. This will be fixed in the
next release.

Thanks,
Mark

On Wed, Jun 5, 2013 at 1:41 AM, ramasLTU [email protected] wrote:

Hi, I have problems with a simple task:

create hive table (stored as textfile compressed with bz2)

import that table to partitioned (and compressed) avro table

Here is a short tale of me hitting the wall. Maybe you can identify which
turn do I miss..

create text table:

CREATE EXTERNAL TABLE sample(
number int,
text string
) ROW FORMAT
DELIMITED FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\n'
STORED AS TEXTFILE
LOCATION '/user/root/sample/'

create some text file with couple rows in a fashion: 1 row1 2 row2
3 row3

compress that file with bz2, upload and check that table returns
when selected: SELECT * FROM sample; Works like charm for me.

create a partitioned, avro table:

CREATE TABLE sample_avro
PARTITIONED BY (number int)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
STORED as INPUTFORMAT
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
TBLPROPERTIES (
'avro.schema.literal'='{
"namespace": "my.sample",
"name": "sample_avro",
"type": "record",
"fields": [ { "name":"text","type":"string"}]
}')

import data to table: SET
hive.exec.dynamic.partition.mode=nonstrict; SET
hive.exec.compress.output=true; INSERT INTO TABLE sample_avro partition
(number) SELECT text, number FROM sample;

This is the moment when bad things happen... In the log i can see:

13/06/04 09:52:28 INFO exec.MoveTask: Partition is: {number=null}
13/06/04 09:52:28 WARN avro.AvroSerdeUtils: Encountered AvroSerdeException
determining schema. Returning signal schema to indicate problem
org.apache.hadoop.hive.serde2.avro.AvroSerdeException: Neither
avro.schema.literal nor avro.schema.url specified, can't determine table
schema
at
org.apache.hadoop.hive.serde2.avro.AvroSerdeUtils.determineSchemaOrThrowException(AvroSerdeUtils.java:66)
at
org.apache.hadoop.hive.serde2.avro.AvroSerdeUtils.determineSchemaOrReturnErrorSchema(AvroSerdeUtils.java:87)
at
org.apache.hadoop.hive.serde2.avro.AvroSerDe.initialize(AvroSerDe.java:59)
at
org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:249)
at
org.apache.hadoop.hive.ql.metadata.Partition.getDeserializer(Partition.java:251)
at
org.apache.hadoop.hive.ql.metadata.Partition.initialize(Partition.java:217)
at org.apache.hadoop.hive.ql.metadata.Partition.(Partition.java:107)
at org.apache.hadoop.hive.ql.metadata.Hive.getPartition(Hive.java:1500)
at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1195)
at
org.apache.hadoop.hive.ql.metadata.Hive.loadDynamicPartitions(Hive.java:1271)
at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:259)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:138)
at
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1374)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1160)
at
com.cloudera.beeswax.BeeswaxServiceImpl$RunningQueryState.execute(BeeswaxServiceImpl.java:344)
at
com.cloudera.beeswax.BeeswaxServiceImpl$RunningQueryState$1$1.run(BeeswaxServiceImpl.java:609)
at
com.cloudera.beeswax.BeeswaxServiceImpl$RunningQueryState$1$1.run(BeeswaxServiceImpl.java:598)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:337)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1388)
at
com.cloudera.beeswax.BeeswaxServiceImpl$RunningQueryState$1.run(BeeswaxServiceImpl.java:598)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

When selecting from sample_avro i get similar exception..

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/28
.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AvroSerdeException when importing data to avro table #28

AvroSerdeException when importing data to avro table #28

ramasLTU commented Jun 5, 2013

wagnermarkd commented Jun 5, 2013

AvroSerdeException when importing data to avro table #28

AvroSerdeException when importing data to avro table #28

Comments

ramasLTU commented Jun 5, 2013

wagnermarkd commented Jun 5, 2013