Pentaho Community Forums

Go Back   Pentaho Community Forums > Pentaho Users > Pentaho Data Integration [Kettle]

Pentaho Data Integration [Kettle] ETL jobs, ETL transforms, Spoon, Carte...

Reply
 
Thread Tools Display Modes
  #1  
Old 01-12-2009, 09:15 PM
clavigne clavigne is offline
Senior Member
 
Join Date: Mar 2006
Posts: 146
Default Amazon S3

Hello. Does anyone have exprience using Kettle to push data to Amazon S3? I'm interested in learning about approaches others may have used.

Thanks,

Chris
Reply With Quote
  #2  
Old 01-13-2009, 04:31 AM
MattCasters MattCasters is offline
Chief Data Integration
 
Join Date: Nov 1999
Posts: 6,782
Send a message via AIM to MattCasters Send a message via MSN to MattCasters Send a message via Yahoo to MattCasters Send a message via Skype™ to MattCasters
Default

Hi Chris,

Data can mean a lot of things. If you talk about CSV/XML then there is not that much around out of the box.
On Linux there are ways to mount an S3 filesystem (s3fs for example, FUSE).

Personally I used an Amazon library to create a parallel reader for S3, a writer shouldn't be too hard.

Data can also mean a relational database. For example, MySQL has an AWS S3 storage engine by Mark Atwood.

Matt
__________________
Matt Casters, Chief Data Integration
Pentaho, Open Source Business Intelligence
http://www.pentaho.org -- mcasters@pentaho.org

Join us on IRC server Freenode.net, channel ##pentaho
Reply With Quote
  #3  
Old 01-13-2009, 11:47 AM
clavigne clavigne is offline
Senior Member
 
Join Date: Mar 2006
Posts: 146
Default Amazon S3

Hi Matt,

In this case, I'm taking about files. Thanks for the links, I'll take a look. I was thinking about an S3 writer. Then I could create a PDI job to extract data from source systems, write to files, compress files, then write the files to an S3 file system in the cloud.

Chris
Reply With Quote
  #4  
Old 01-13-2009, 12:27 PM
MattCasters MattCasters is offline
Chief Data Integration
 
Join Date: Nov 1999
Posts: 6,782
Send a message via AIM to MattCasters Send a message via MSN to MattCasters Send a message via Yahoo to MattCasters Send a message via Skype™ to MattCasters
Default

You could probably start from the "Text File Output" step and convert it to the JetS3t Java lib.
__________________
Matt Casters, Chief Data Integration
Pentaho, Open Source Business Intelligence
http://www.pentaho.org -- mcasters@pentaho.org

Join us on IRC server Freenode.net, channel ##pentaho
Reply With Quote
  #5  
Old 01-13-2009, 01:08 PM
clavigne clavigne is offline
Senior Member
 
Join Date: Mar 2006
Posts: 146
Default S3

Hi Matt,

I was also looking at using something like Jungle Disk ($20) - http://www.jungledisk.com/index.aspx. Then I could mount the S3 to the PDI server. JungleDisk would handle file transfer and encryption. For $1 a month extra it also has incremental backup capability and can restart large file transfers from the point of failure.

Do you know if there is an EC2 machine image with PDI?

Chris
Reply With Quote
  #6  
Old 01-13-2009, 01:31 PM
MattCasters MattCasters is offline
Chief Data Integration
 
Join Date: Nov 1999
Posts: 6,782
Send a message via AIM to MattCasters Send a message via MSN to MattCasters Send a message via Yahoo to MattCasters Send a message via Skype™ to MattCasters
Default

If you're using Linux, s3fs will do the trick, had some trouble with JungleDisk myself.
Besides that, whatever gets the job done :-)

No PDI image yet, sorry. Since I need a few for MySQL UC (let's meet up again!) there should be a few popping up in the next couple of months though.
I'll make sure to make them public.

Take care,
Matt
__________________
Matt Casters, Chief Data Integration
Pentaho, Open Source Business Intelligence
http://www.pentaho.org -- mcasters@pentaho.org

Join us on IRC server Freenode.net, channel ##pentaho
Reply With Quote
  #7  
Old 01-13-2009, 01:34 PM
clavigne clavigne is offline
Senior Member
 
Join Date: Mar 2006
Posts: 146
Default S3

Great, I'll be at the MySQL UC this year - speaking on the last day - http://en.oreilly.com/mysql2009/publ...le/detail/5593
Reply With Quote
  #8  
Old 01-13-2009, 01:40 PM
MattCasters MattCasters is offline
Chief Data Integration
 
Join Date: Nov 1999
Posts: 6,782
Send a message via AIM to MattCasters Send a message via MSN to MattCasters Send a message via Yahoo to MattCasters Send a message via Skype™ to MattCasters
Default

I have my solo session on Wednesday: http://en.oreilly.com/mysql2009/publ...le/detail/6739
On Tuesday I'm presenting with Roland Bouman: http://en.oreilly.com/mysql2009/publ...le/detail/7016

Cheers,
Matt
__________________
Matt Casters, Chief Data Integration
Pentaho, Open Source Business Intelligence
http://www.pentaho.org -- mcasters@pentaho.org

Join us on IRC server Freenode.net, channel ##pentaho
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT -4. The time now is 05:42 AM.