![]() |
|
|||||||
| Pentaho Data Integration [Kettle] ETL jobs, ETL transforms, Spoon, Carte... |
![]() |
|
|
Thread Tools | Display Modes |
|
#1
|
|||
|
|||
|
Hello. Does anyone have exprience using Kettle to push data to Amazon S3? I'm interested in learning about approaches others may have used.
Thanks, Chris |
|
#2
|
|||
|
|||
|
Hi Chris,
Data can mean a lot of things. If you talk about CSV/XML then there is not that much around out of the box. On Linux there are ways to mount an S3 filesystem (s3fs for example, FUSE). Personally I used an Amazon library to create a parallel reader for S3, a writer shouldn't be too hard. Data can also mean a relational database. For example, MySQL has an AWS S3 storage engine by Mark Atwood. Matt
__________________
Matt Casters, Chief Data Integration Pentaho, Open Source Business Intelligence http://www.pentaho.org -- mcasters@pentaho.org Join us on IRC server Freenode.net, channel ##pentaho |
|
#3
|
|||
|
|||
|
Hi Matt,
In this case, I'm taking about files. Thanks for the links, I'll take a look. I was thinking about an S3 writer. Then I could create a PDI job to extract data from source systems, write to files, compress files, then write the files to an S3 file system in the cloud. Chris |
|
#4
|
|||
|
|||
|
You could probably start from the "Text File Output" step and convert it to the JetS3t Java lib.
__________________
Matt Casters, Chief Data Integration Pentaho, Open Source Business Intelligence http://www.pentaho.org -- mcasters@pentaho.org Join us on IRC server Freenode.net, channel ##pentaho |
|
#5
|
|||
|
|||
|
Hi Matt,
I was also looking at using something like Jungle Disk ($20) - http://www.jungledisk.com/index.aspx. Then I could mount the S3 to the PDI server. JungleDisk would handle file transfer and encryption. For $1 a month extra it also has incremental backup capability and can restart large file transfers from the point of failure. Do you know if there is an EC2 machine image with PDI? Chris |
|
#6
|
|||
|
|||
|
If you're using Linux, s3fs will do the trick, had some trouble with JungleDisk myself.
Besides that, whatever gets the job done :-) No PDI image yet, sorry. Since I need a few for MySQL UC (let's meet up again!) there should be a few popping up in the next couple of months though. I'll make sure to make them public. Take care, Matt
__________________
Matt Casters, Chief Data Integration Pentaho, Open Source Business Intelligence http://www.pentaho.org -- mcasters@pentaho.org Join us on IRC server Freenode.net, channel ##pentaho |
|
#7
|
|||
|
|||
|
Great, I'll be at the MySQL UC this year - speaking on the last day - http://en.oreilly.com/mysql2009/publ...le/detail/5593
|
|
#8
|
|||
|
|||
|
I have my solo session on Wednesday: http://en.oreilly.com/mysql2009/publ...le/detail/6739
On Tuesday I'm presenting with Roland Bouman: http://en.oreilly.com/mysql2009/publ...le/detail/7016 Cheers, Matt
__________________
Matt Casters, Chief Data Integration Pentaho, Open Source Business Intelligence http://www.pentaho.org -- mcasters@pentaho.org Join us on IRC server Freenode.net, channel ##pentaho |
![]() |
| Thread Tools | |
| Display Modes | |
|
|