• Stars
    star
    225
  • Rank 177,187 (Top 4 %)
  • Language
    HTML
  • License
    MIT License
  • Created over 15 years ago
  • Updated almost 9 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Open local or remote XLSX, XLS, ODS, CSV (comma separated), TSV (tab separated), other delimited, fixed-width files, and Google Docs. Returns an enumerator of Arrays or Hashes, depending on whether there are headers.

remote_table

Open Google Docs spreadsheets, local or remote XLSX, XLS, ODS, CSV (comma separated), TSV (tab separated), other delimited, fixed-width files.

Tested on MRI 1.8, MRI 1.9, and JRuby 1.6.7+. Thread-safe.

Sponsor

Faraday logo

We use remote_table for data-driven marketing at Faraday.

Example

>> require 'remote_table'
=> true
>> t = RemoteTable.new 'http://www.fueleconomy.gov/FEG/epadata/98guide6.zip', :filename => '98guide6.csv'
=> #<RemoteTable:0x00000101b87390 @download_count_mutex=#<Mutex:0x00000101b87228>, @extend_bang_mutex=#<Mutex:0x00000101b871d8>, @errata_mutex=#<Mutex:0x00000101b871b0>, @cache=[], @download_count=0, @url="http://www.fueleconomy.gov/FEG/epadata/98guide6.zip", @format=nil, @headers=:first_row, @compression=:zip, @packing=nil, @streaming=false, @warn_on_multiple_downloads=true, @delimiter=",", @sheet=nil, @keep_blank_rows=false, @form_data=nil, @skip=0, @internal_encoding="UTF-8", @row_xpath=nil, @column_xpath=nil, @row_css=nil, @column_css=nil, @glob=nil, @filename="98guide6.csv", @transform_settings=nil, @cut=nil, @crop=nil, @schema=nil, @schema_name=nil, @pre_select=nil, @pre_reject=nil, @errata_settings=nil, @other_options={}, @transformer=#<RemoteTable::Transformer:0x00000101b8c2f0 @t=#<RemoteTable:0x00000101b87390 ...>, @legacy_transformer_mutex=#<Mutex:0x00000101b8c2a0>>, @local_copy=#<RemoteTable::LocalCopy:0x00000101b8bf58 @t=#<RemoteTable:0x00000101b87390 ...>, @encoded_io_mutex=#<Mutex:0x00000101b8be18>, @generate_mutex=#<Mutex:0x00000101b8bdc8>>>
>> t.rows.length
=> 806
>> t.rows.first.length
=> 26
>> require 'pp'
=> true
>> pp t[23]
{"Class"=>"TWO SEATERS",
 "Manufacturer"=>"PORSCHE",
 "carline name"=>"BOXSTER",
 "displ"=>"2.5",
 "cyl"=>"6",
 "trans"=>"Manual(M5)",
 "drv"=>"R",
 "cty"=>"19",
 "hwy"=>"26",
 "cmb"=>"22",
 "ucty"=>"21.2",
 "uhwy"=>"33.9499",
 "ucmb"=>"25.5114",
 "fl"=>"P",
 "G"=>"",
 "T"=>"",
 "S"=>"",
 "2pv"=>"",
 "2lv"=>"",
 "4pv"=>"",
 "4lv"=>"",
 "hpv"=>"",
 "hlv"=>"",
 "fcost"=>"956",
 "eng dscr"=>"",
 "trans dscr"=>""}

Columns and rows

  • If there are headers, you get an Array of Hashes with string keys.
  • If you set :headers => false, then you get an Array of Arrays.

Row keys

Row keys are strings. Row keys are NOT symbolized.

row['foobar'] # correct
row[:foobar]  # incorrect

You can call symbolize_keys yourself, but we don't do it automatically to avoid creating loads of garbage symbols.

Supported formats

Format Notes Library
Delimited (CSV, TSV, etc.) All RemoteTable::Delimited::PASSTHROUGH_CSV_SETTINGS, for example :col_sep, are passed directly to fastercsv. fastercsv (1.8); stdlib (1.9)
Fixed width You have to set up a :schema. fixed_width-multibyte
HTML See XML. nokogiri
ODS roo
XLS roo
XLSX roo
XML The idea is to set up a :row_[xpath|css] and (optionally) a :column_[xpath|css]. nokogiri
JSON Force JSON format using format: :json and define root nodes using root_node: 'data' JSON

Compression and packing

You can directly pick a file out of a remote archive using :filename or use a :glob.

  • zip
  • tar
  • bz2
  • gz
  • exe (treated as zip)

Encoding

Everything is forced into UTF-8. You can improve the quality of the conversion by specifying the original encoding with :encoding

  • ASCII-8BIT and BINARY are equal
  • ISO-8859-1 and Latin1 are equal

More examples

RemoteTable.new('https://spreadsheets.google.com/pub?key=0AoQJbWqPrREqdHRNaVpSUWw2Z2VhN3RUV25yYWdQX2c&output=csv')

# aircraft fuel use equations derived from EMEP/EEA and ICAO
RemoteTable.new('https://docs.google.com/spreadsheet/pub?key=0AoQJbWqPrREqdEhYenF3dGt1T0Y1cTdneUNsNjV0dEE&output=csv')

# distance classes from the WRI business travel tool and UK DEFRA/DECC GHG Conversion Factors for Company Reporting
RemoteTable.new('https://spreadsheets.google.com/pub?key=0AoQJbWqPrREqdFBKM0xWaUhKVkxDRmdBVkE3VklxY2c&hl=en&gid=0&output=csv')

# seat classes used in the WRI GHG Protocol calculation tools
RemoteTable.new('https://docs.google.com/spreadsheet/pub?key=0AoQJbWqPrREqdG9EdmxybG1wdC1iU3JRYXNkMGhvSnc&output=csv')

# pure automobile fuels
RemoteTable.new('https://spreadsheets.google.com/pub?key=0AoQJbWqPrREqdE9xTEdueFM2R0diNTgxUlk1QXFSb2c&gid=0&output=csv')

# blended automobile fuels
RemoteTable.new('https://spreadsheets.google.com/pub?key=0AoQJbWqPrREqdEswNGIxM0U4U0N1UUppdWw2ejJEX0E&gid=0&output=csv')

# A list of hybrid make model years derived from the EPA fuel economy guide
RemoteTable.new('https://docs.google.com/spreadsheet/pub?hl=en_US&hl=en_US&key=0AoQJbWqPrREqdGtzekE4cGNoRGVmdmZMaTNvOWluSnc&output=csv')

# BTS aircraft type lookup table
RemoteTable.new("http://www.transtats.bts.gov/Download_Lookup.asp?Lookup=L_AIRCRAFT_TYPE",
                :errata => { :url => RemoteTable.new('https://spreadsheets.google.com/spreadsheet/pub?key=0AoQJbWqPrREqdEZ2d3JQMzV5T1o1T3JmVlFyNUZxdEE&output=csv' })

# aircraft made by whitelisted manufacturers whose ICAO code starts with 'B' from the FAA
# for definition of `Aircraft::Guru` and `manufacturer_whitelist?` see https://github.com/brighterplanet/earth/blob/master/lib/earth/air/aircraft/data_miner.rb
RemoteTable.new("http://www.faa.gov/air_traffic/publications/atpubs/CNT/5-2-B.htm",
                :encoding => 'windows-1252',
                :row_xpath => '//table/tr[2]/td/table/tr',
                :column_xpath => 'td',
                :errata => { :url => 'https://spreadsheets.google.com/spreadsheet/pub?key=0AoQJbWqPrREqdGVBRnhkRGhSaVptSDJ5bXJGbkpUSWc&output=csv', :responder => Aircraft::Guru.new },
                :select => proc { |record| manufacturer_whitelist? record['Manufacturer'] })

# OpenFlights.org airports database
RemoteTable.new('https://openflights.svn.sourceforge.net/svnroot/openflights/openflights/data/airports.dat',
                :headers => %w{ id name city country_name iata_code icao_code latitude longitude altitude timezone daylight_savings },
                :select => proc { |record| record['iata_code'].present? },
                :errata => { :url => RemoteTable.new('https://spreadsheets.google.com/pub?key=0AoQJbWqPrREqdFc2UzhQYU5PWEQ0N21yWFZGNmc2a3c&gid=0&output=csv', :responder => Airport::Guru.new }) # see https://github.com/brighterplanet/earth/blob/master/lib/earth/air/aircraft/data_miner.rb

# T100 flight segment data for #{month.strftime('%B %Y')}
# for definition of `form_data` and `FlightSegment::Guru` see https://github.com/brighterplanet/earth/blob/master/lib/earth/air/flight_segment/data_miner.rb
RemoteTable.new('http://www.transtats.bts.gov/DownLoad_Table.asp',
                :form_data => form_data,
                :compression => :zip,
                :glob => '/*.csv',
                :errata => { :url => RemoteTable.new('https://spreadsheets.google.com/spreadsheet/pub?key=0AoQJbWqPrREqdGxpYU1qWFR3d0syTVMyQVVOaDd0V3c&output=csv', :responder => FlightSegment::Guru.new },
                :select => proc { |record| record['DEPARTURES_PERFORMED'].to_i > 0 })

# 1995 Fuel Economy Guide
# for definition of `:fuel_economy_guide_b` and `AutomobileMakeModelYearVariant::ParserB` see https://github.com/brighterplanet/earth/blob/master/lib/earth/automobile/automobile_make_model_year_variant/data_miner.rb
RemoteTable.new("http://www.fueleconomy.gov/FEG/epadata/95mfgui.zip",
                :filename => '95MFGUI.DAT',
                :format => :fixed_width,
                :cut => '13-',
                :schema_name => :fuel_economy_guide_b,
                :select => proc { |row| row['model'].present? and (row['suppress_code'].blank? or row['suppress_code'].to_f == 0) and row['state_code'] == 'F' },
                :transform => { :class => AutomobileMakeModelYearVariant::ParserB, :year => 1995 },
                :errata => { :url => "https://docs.google.com/spreadsheet/pub?key=0AoQJbWqPrREqdDkxTElWRVlvUXB3Uy04SDhSYWkzakE&output=csv", :responder => AutomobileMakeModelYearVariant::Guru.new })

# 1998 Fuel Economy Guide
# for definition of `AutomobileMakeModelYearVariant::ParserC` see https://github.com/brighterplanet/earth/blob/master/lib/earth/automobile/automobile_make_model_year_variant/data_miner.rb
RemoteTable.new('http://www.fueleconomy.gov/FEG/epadata/98guide6.zip',
                :filename => '98guide6.csv',
                :transform => { :class => AutomobileMakeModelYearVariant::ParserC, :year => 1998 },
                :errata => { :url => "https://docs.google.com/spreadsheet/pub?key=0AoQJbWqPrREqdDkxTElWRVlvUXB3Uy04SDhSYWkzakE&output=csv", :responder => AutomobileMakeModelYearVariant::Guru.new },
                :select => proc { |row| row['model'].present? })

# annual corporate average fuel economy data for domestic and imported vehicle fleets from the NHTSA
RemoteTable.new('https://spreadsheets.google.com/pub?key=0AoQJbWqPrREqdEdXWXB6dkVLWkowLXhYSFVUT01sS2c&hl=en&gid=0&output=csv',
                :errata => { 'url' => 'http://static.brighterplanet.com/science/data/transport/automobiles/make_fleet_years/errata.csv' },
                :select => proc { |row| row['volume'].to_i > 0 })

# total vehicle miles travelled by gasoline passenger cars from the 2010 EPA GHG Inventory
RemoteTable.new('http://www.epa.gov/climatechange/emissions/downloads10/2010-Inventory-Annex-Tables.zip',
                :filename => 'Annex Tables/Annex 3/Table A-87.csv',
                :skip => 1,
                :select => proc { |row| row['Year'].to_i.to_s == row['Year'] })

# total vehicle miles travelled from the 2010 EPA GHG Inventory
RemoteTable.new('http://www.epa.gov/climatechange/emissions/downloads10/2010-Inventory-Annex-Tables.zip',
                :filename => 'Annex Tables/Annex 3/Table A-87.csv',
                :skip => 1,
                :select => proc { |row| row['Year'].to_i.to_s == row['Year'] })

# total travel distribution from the 2010 EPA GHG Inventory
RemoteTable.new('http://www.epa.gov/climatechange/emissions/downloads10/2010-Inventory-Annex-Tables.zip',
                :filename => 'Annex Tables/Annex 3/Table A-93.csv',
                :skip => 1,
                :select => proc { |row| row['Vehicle Age'].to_i.to_s == row['Vehicle Age'] })

# building characteristics from the 2003 EIA Commercial Building Energy Consumption Survey
RemoteTable.new('http://www.eia.gov/emeu/cbecs/cbecs2003/public_use_2003/data/FILE02.csv',
                :skip => 1,
                :headers => ["PUBID8","REGION8","CENDIV8","SQFT8","SQFTC8","YRCONC8","PBA8","ELUSED8","NGUSED8","FKUSED8","PRUSED8","STUSED8","HWUSED8","ONEACT8","ACT18","ACT28","ACT38","ACT1PCT8","ACT2PCT8","ACT3PCT8","PBAPLUS8","VACANT8","RWSEAT8","PBSEAT8","EDSEAT8","FDSEAT8","HCBED8","NRSBED8","LODGRM8","FACIL8","FEDFAC8","FACACT8","MANIND8","PLANT8","FACDST8","FACDHW8","FACDCW8","FACELC8","BLDPLT8","ADJWT8","STRATUM8","PAIR8"])

# 2003 CBECS C17 - Electricity Consumption and Intensity - New England Division
# for definition of `CbecsEnergyIntensity::NAICS_CODE_SYNTHESIZER` see https://github.com/brighterplanet/earth/blob/master/lib/earth/industry/cbecs_energy_intensity/data_miner.rb
RemoteTable.new("http://www.eia.gov/emeu/cbecs/cbecs2003/detailed_tables_2003/2003set10/2003excel/C17.xls",
                :headers => false,
                :select => proc { |row| CbecsEnergyIntensity::NAICS_CODE_SYNTHESIZER.call(row) },
                :crop => (21..37))

# U.S. Census 2002 NAICS code list
RemoteTable.new('http://www.census.gov/epcd/naics02/naicod02.txt',
                :skip => 4,
                :headers => false,
                :delimiter => '	')

# MECS table 3.2 Total US
RemoteTable.new("http://205.254.135.24/emeu/mecs/mecs2006/excel/Table3_2.xls",
                :crop => (15..94),
                :headers => ["NAICS Code", "Subsector and Industry", "Total", "BLANK", "Net Electricity", "BLANK", "Residual Fuel Oil", "Distillate Fuel Oil", "Natural Gas", "BLANK", "LPG and NGL", "BLANK", "Coal", "Coke and Breeze", "Other"])

# MECS table 6.1 Midwest
RemoteTable.new("http://205.254.135.24/emeu/mecs/mecs2006/excel/Table6_1.xls",
                :crop => (184..263),
                :headers => ["NAICS Code", "Subsector and Industry", "Consumption per Employee", "Consumption per Dollar of Value Added", "Consumption per Dollar of Value of Shipments"])

# U.S. Census Geographic Terms and Definitions
RemoteTable.new('http://www.census.gov/popest/about/geo/state_geocodes_v2009.txt',
                :skip => 6,
                :headers => %w{ Region Division FIPS Name },
                :select => proc { |row| row['Division'].to_i > 0 and row['FIPS'].to_i == 0 })

# state census divisions from the U.S. Census
RemoteTable.new('http://www.census.gov/popest/about/geo/state_geocodes_v2009.txt',
                :skip => 8,
                :headers => ['Region', 'Division', 'State FIPS', 'Name'],
                :select => proc { |row| row['State FIPS'].to_i > 0 })

# OpenGeoCode.org's Country Codes to Country Names list
RemoteTable.new('http://opengeocode.org/download/countrynames.txt',
                :format => :delimited,
                :delimiter => ';',
                :headers => false,
                :skip => 22)

# heating degree day data from WRI CAIT
RemoteTable.new('https://docs.google.com/spreadsheet/pub?key=0AoQJbWqPrREqdDN4MkRTSWtWRjdfazhRdWllTkVSMkE&output=csv',
                :select => Proc.new { |record| record['country'] != 'European Union (27)' },
                :errata => { :url => RemoteTable.new('https://docs.google.com/spreadsheet/pub?key=0AoQJbWqPrREqdDNSMUtCV0h4cUF4UnBKZlNkczlNbFE&output=csv' })

# US average grid loss factor derived eGRID 2007 data
RemoteTable.new('http://www.epa.gov/cleanenergy/documents/egridzips/eGRID2010V1_1_STIE_USGC.xls',
                :sheet => 'USGC',
                :skip => 5)

# eGRID 2010 regions and loss factors
RemoteTable.new('http://www.epa.gov/cleanenergy/documents/egridzips/eGRID2010V1_1_STIE_USGC.xls',
                :sheet => 'STIE07',
                :skip => 4,
                :select => proc { |row| row['eGRID2010 year 2007 file state sequence number'].to_i.between?(1, 51) })

# eGRID 2010 subregions and electricity emission factors
RemoteTable.new('http://www.epa.gov/cleanenergy/documents/egridzips/eGRID2010_Version1-1_xls_only.zip',
                :filename => 'eGRID2010V1_1_year07_AGGREGATION.xls',
                :sheet => 'SRL07',
                :skip => 4,
                :select => proc { |row| row['SEQSRL07'].to_i.between?(1, 26) })

# U.S. Census State ANSI Code file
RemoteTable.new('http://www.census.gov/geo/www/ansi/state.txt',
                :delimiter => '|',
                :select => proc { |record| record['STATE'].to_i < 60 })

# Mapping Hacks zipcode database
RemoteTable.new('http://mappinghacks.com/data/zipcode.zip',
                :filename => 'zipcode.csv')

# zipcode states and eGRID Subregions from the US EPA
RemoteTable.new('http://www.epa.gov/cleanenergy/documents/egridzips/Power_Profiler_Zipcode_Tool_v3-2.xlsx',
                :sheet => 'Zip-subregion')

# horse breeds
RemoteTable.new('http://www.freebase.com/type/exporttypeinstances/base/horses/horse_breed?page=0&filter_mode=type&filter_view=table&show%01p%3D%2Ftype%2Fobject%2Fname%01index=0&show%01p%3D%2Fcommon%2Ftopic%2Fimage%01index=1&show%01p%3D%2Fcommon%2Ftopic%2Farticle%01index=2&sort%01p%3D%2Ftype%2Fobject%2Ftype%01p%3Dlink%01p%3D%2Ftype%2Flink%2Ftimestamp%01index=false&=&exporttype=csv-8')

# Brighter Planet's list of cat and dog breeds, genders, and weights
RemoteTable.new('http://static.brighterplanet.com/science/data/consumables/pets/breed_genders.csv',
                :encoding => 'ISO-8859-1',
                :select => proc { |row| row['gender'].present? })

# residential electricity prices from the EIA
RemoteTable.new('http://www.eia.doe.gov/cneaf/electricity/page/sales_revenue.xls',
                :select => proc { |row| row['Year'].to_s.first(4).to_i > 1989 })

# residential natural gas prices from the EIA
# for definition of `NaturalGasParser` see https://github.com/brighterplanet/earth/blob/master/lib/earth/residence/residence_fuel_price/data_miner.rb
RemoteTable.new('http://tonto.eia.doe.gov/dnav/ng/xls/ng_pri_sum_a_EPG0_FWA_DMcf_a.xls',
                :sheet => 'Data 1',
                :skip => 2,
                :select => proc { |row| row['year'].to_i > 1989 },
                :transform => { :class => NaturalGasParser })

# 2005 EIA Residential Energy Consumption Survey microdata
RemoteTable.new('http://www.eia.doe.gov/emeu/recs/recspubuse05/datafiles/RECS05alldata.csv',
                :headers => :upcase)

# Public albums from the Facebook Engineering Team
RemoteTable.new('https://graph.facebook.com/Engineering/albums', format: :json, root_node: 'data')

# ...and more from the tests...

RemoteTable.new 'http://spreadsheets.google.com/pub?key=t5HM1KbaRngmTUbntg8JwPA&single=true&gid=0'

RemoteTable.new 'http://spreadsheets.google.com/pub?key=t5HM1KbaRngmTUbntg8JwPA'

RemoteTable.new 'http://spreadsheets.google.com/pub?key=t5HM1KbaRngmTUbntg8JwPA', :skip => 1, :headers => false

RemoteTable.new 'http://spreadsheets.google.com/pub?key=tObVAGyqOkCBtGid0tJUZrw'

RemoteTable.new 'http://spreadsheets.google.com/pub?key=tObVAGyqOkCBtGid0tJUZrw', :headers => %w{ col1 col2 col3 }

RemoteTable.new 'http://spreadsheets.google.com/pub?key=tujrgUOwDSLWb-P4KCt1qBg'

RemoteTable.new 'http://tonto.eia.doe.gov/dnav/pet/xls/PET_PRI_RESID_A_EPPR_PTA_CPGAL_M.xls', :transform => { :class => FuelOilParser }

RemoteTable.new 'http://www.freebase.com/type/exporttypeinstances/base/horses/horse_breed?page=0&filter_mode=type&filter_view=table&show%01p%3D%2Ftype%2Fobject%2Fname%01index=0&show%01p%3D%2Fcommon%2Ftopic%2Fimage%01index=1&show%01p%3D%2Fcommon%2Ftopic%2Farticle%01index=2&sort%01p%3D%2Ftype%2Fobject%2Ftype%01p%3Dlink%01p%3D%2Ftype%2Flink%2Ftimestamp%01index=false&=&exporttype=csv-8'

RemoteTable.new 'http://www.fueleconomy.gov/FEG/epadata/02data.zip', :filename => 'guide_jan28.xls'

RemoteTable.new 'http://www.fueleconomy.gov/FEG/epadata/08data.zip', :filename => '2008_FE_guide_ALL_rel_dates_-no sales-for DOE-5-1-08.csv'

RemoteTable.new 'http://www.fueleconomy.gov/FEG/epadata/08data.zip', :glob => '/*.csv'

RemoteTable.new 'http://www.fueleconomy.gov/FEG/epadata/98guide6.zip', :filename => '98guide6.csv'

RemoteTable.new 'http://www.worldmapper.org/data/opendoc/2_worldmapper_data.ods', :sheet => 'Data', :keep_blank_rows => true

RemoteTable.new 'https://spreadsheets.google.com/pub?key=t5HM1KbaRngmTUbntg8JwPA'

RemoteTable.new 'www.customerreferenceprogram.org/uploads/CRP_RFP_template.xlsx'

RemoteTable.new 'www.customerreferenceprogram.org/uploads/CRP_RFP_template.xlsx', :headers => %w{foo bar baz}

RemoteTable.new 'www.customerreferenceprogram.org/uploads/CRP_RFP_template.xlsx', :headers => false

RemoteTable.new 'http://www.transtats.bts.gov/DownLoad_Table.asp?Table_ID=293&Has_Group=3&Is_Zipped=0', :form_data => 'UserTableName=T_100_Segment__All_Carriers&[...]', :compression => :zip, :glob => '/*.csv'

RemoteTable.new "http://www.faa.gov/air_traffic/publications/atpubs/CNT/5-2-E.htm",
                :encoding => 'US-ASCII',
                :row_xpath => '//table/tr[2]/td/table/tr',
                :column_xpath => 'td'

RemoteTable.new "http://www.faa.gov/air_traffic/publications/atpubs/CNT/5-2-G.htm",
                :encoding => 'windows-1252',
                :row_xpath => '//table/tr[2]/td/table/tr',
                :column_xpath => 'td',
                :errata => Errata.new(:url => 'http://spreadsheets.google.com/pub?key=tObVAGyqOkCBtGid0tJUZrw',
                                      :responder => AircraftGuru.new)

RemoteTable.new "http://www.faa.gov/air_traffic/publications/atpubs/CNT/5-2-G.htm",
                :encoding => 'windows-1252',
                :row_xpath => '//table/tr[2]/td/table/tr',
                :column_xpath => 'td',
                :errata => { :url => 'http://spreadsheets.google.com/pub?key=tObVAGyqOkCBtGid0tJUZrw',
                             :responder => AircraftGuru.new }

RemoteTable.new 'http://www.fueleconomy.gov/FEG/epadata/00data.zip',
                :filename => 'Gd6-dsc.txt',
                :format => :fixed_width,
                :crop => 21..26, # inclusive
                :cut => '2-',
                :select => proc { |row| /\A[A-Z]/.match row['code'] },
                :schema => [[ 'code',   2, { :type => :string }  ],
                            [ 'spacer', 2 ],
                            [ 'name',   52, { :type => :string } ]]

RemoteTable.new 'http://cloud.github.com/downloads/seamusabshere/remote_table/test2.fixed_width.txt',
                :format => :fixed_width,
                :skip => 1,
                :schema => [[ 'header4', 10, { :type => :string }  ],
                            [ 'spacer',  1 ],
                            [ 'header5', 10, { :type => :string } ],
                            [ 'spacer',  12 ],
                            [ 'header6', 10, { :type => :string } ]]

RemoteTable.new 'http://cloud.github.com/downloads/seamusabshere/remote_table/test2.fixed_width.txt',
                :format => :fixed_width,
                :keep_blank_rows => true,
                :skip => 1,
                :schema => [[ 'header4', 10, { :type => :string }  ],
                            [ 'spacer',  1 ],
                            [ 'header5', 10, { :type => :string } ],
                            [ 'spacer',  12 ],
                            [ 'header6', 10, { :type => :string } ]]

RemoteTable.new 'http://cloud.github.com/downloads/seamusabshere/remote_table/remote_table_row_hash_test.fixed_width.txt',
                :format => :fixed_width,
                :skip => 1,
                :schema => [[ 'header1', 10, { :type => :string }  ],
                            [ 'spacer',  1 ],
                            [ 'header2', 10, { :type => :string } ],
                            [ 'spacer',  12 ],
                            [ 'header3', 10, { :type => :string } ]]

RemoteTable.new 'http://cloud.github.com/downloads/seamusabshere/remote_table/remote_table_row_hash_test.alternate_order.fixed_width.txt',
                :format => :fixed_width,
                :skip => 1,
                :schema => [[ 'spacer',  11 ],
                            [ 'header2', 10, { :type => :string }  ],
                            [ 'spacer',  1 ],
                            [ 'header3', 10, { :type => :string } ],
                            [ 'spacer',  1 ],
                            [ 'header1', 10, { :type => :string } ]]

Requirements

  • Unix tools like curl, iconv, perl, cat, cut, tail, etc. accessible from your $PATH
  • geo_ruby and dbf gems if you plan on fetching shapefiles

Wishlist

  • Win32 compat

Authors

Copyright

Copyright (c) 2012 Brighter Planet. See LICENSE for details.

More Repositories

1

fuzzy_match

Find a needle (a document or record) in a haystack using string similarity and (optionally) regular expression rules. Uses Dice's Coefficient (aka Pair Similiarity) and Levenshtein Distance internally.
Ruby
668
star
2

upsert

Upsert on MySQL, PostgreSQL, and SQLite3. Transparently creates functions (UDF) for MySQL and PostgreSQL; on SQLite3, uses INSERT OR IGNORE.
Ruby
652
star
3

data_miner

Download, unpack from a ZIP/TAR/GZ/BZ2 archive, parse, correct, convert units and import Google Spreadsheets, XLS, ODS, XML, CSV, HTML, etc. into your ActiveRecord models. Uses RemoteTable gem internally.
Ruby
301
star
4

unix_utils

Like FileUtils, but provides zip, unzip, bzip2, bunzip2, tar, untar, sed, du, md5sum, shasum, cut, head, tail, wc, unix2dos, dos2unix, iconv, curl, perl, etc.
Ruby
227
star
5

cache_method

Cache based on arguments AND object state; store in memcached, redis, or in-process. Like alias_method, but it's cache_method! One step beyond memoization.
Ruby
136
star
6

lock_and_cache

Most caching libraries don't do locking, meaning that >1 process can be calculating a cached value at the same time. Since you presumably cache things because they cost CPU, database reads, or money, doesn't it make sense to lock while caching?
Ruby
134
star
7

mysql2xxxx

Gives you binaries like mysql2csv, mysql2json, and mysql2xml, and Ruby classes to match.
Ruby
83
star
8

cache

Defines a simple interface to multiple cache-like storage engines by wrapping common Ruby client libraries like memcached, redis, memcache-client, dalli. Handles each underlying library's weirdnesses, including forking.
Ruby
69
star
9

eat

A (better?) replacement for open-uri. Lets you open local and remote files by immediately returning their contents as a string.
Ruby
32
star
10

to_regexp

Provides String#to_regexp
Ruby
27
star
11

errata

Define an errata in table format (CSV) and then apply it to an arbitrary source. Inspired by RFC Errata, lets you keep your own errata in a transparent way.
Ruby
21
star
12

cacheable

DEPRECATED. Use cache_method instead.
Ruby
20
star
13

py-upsert

Python library to make it easy to upsert on MySQL, PostgreSQL, and SQLite3.
Python
18
star
14

report

DSL for creating clean CSV, XLSX, and PDF reports in Ruby. Uses xlsx_writer, prawn and pdftk internally.
Ruby
16
star
15

database_url

Convert back and forth between Heroku-style ENV['DATABASE_URL'] and Rails/ActiveRecord-style config/database.yml hashes.
Ruby
16
star
16

lock_method

Like alias_method, but it's lock_method! (lockfiles)
Ruby
12
star
17

common_name

Helps you stop using chains of humanize/downcase/underscore/pluralize/to_sym/etc everywhere in your models, your views, your controllers, etc.
Ruby
11
star
18

engineyard-metadata

Presents a simple, unchanging interface to get metadata about your EngineYard AppCloud instances running on Amazon EC2.
Ruby
10
star
19

cohort_analysis

TBD
Ruby
10
star
20

create_table

Analyze and inspect CREATE TABLE SQL statements and translate across databases. Uses Ragel internally for parsing.
Ruby
10
star
21

ruby_ragel_examples

Examples of using ragel and ruby together
Ruby
9
star
22

fuzzy_infer

Fuzzy set analysis - predicts one or more unknown characteristics of an input case by comparing its known characteristics to a reference dataset whose records contain both the known and unknown characteristics.
Ruby
8
star
23

the_geom_geojson

For PostGIS/PostgreSQL and ActiveRecord, provides "the_geom_geojson" getter and setter that update "the_geom" and "the_geom_webmercator" columns.
Ruby
8
star
24

validates_decency_of

Rails plugin that uses George Carlin's list of seven dirty words (aka swear words, aka cuss words, aka bad words) to check for "decency" on ActiveRecord model attributes.
Ruby
6
star
25

weighted_average

Aircraft.average(:seats) versus Aircraft.weighted_average(:seats, :weighted_by => :takeoffs)
Ruby
6
star
26

loose_tight_dictionary

DEPRECATED: use fuzzy_match. Find a needle in a haystack using string similarity and (optionally) regexp rules.
Ruby
6
star
27

pg_trgm

Ruby trigram similarity that is identical to Postgres's (almost)
Ruby
5
star
28

redirect_routing

Ruby
5
star
29

hash_digest

Generates non-cryptographic digests of Hashes (and Arrays) indifferent to key type (string or symbol) and ordering.
Ruby
5
star
30

ey_cloud_awareness

DEPRECATED: use engineyard-metadata. Make your EngineYard cloud instances aware of each other.
Ruby
4
star
31

xml_split

Split XML files on an element, yielding (streaming, so constant memory usage) each node in turn. Uses sgrep2 internally; future versions should use a pure-Ruby SAX parser.
Ruby
3
star
32

characterizable

DEPRECATED. Use charisma instead.
Ruby
3
star
33

has_handle_fallback

Make it easy to use handles (callsigns/monikers/usernames) in URLs, even if they might be blank.
Ruby
3
star
34

table_warnings

Warn yourself of problems with your ActiveRecord tables.
Ruby
3
star
35

to_json_fix

TODO: one-line summary of your gem
Ruby
3
star
36

honeypot

TODO: one-line summary of your gem
Ruby
3
star
37

cohort_scope

DEPRECATED. Use cohort_analysis. Provides cohorts (in the form of ActiveRecord scopes) that dynamically widen until they contain a certain number of records.
Ruby
3
star
38

vector_embed

Vector embedding of strings, booleans, numerics, and arrays into LIBSVM / LIBLINEAR format.
Ruby
3
star
39

switches

Turn on and off parts of your code based on yaml files.
Ruby
3
star
40

flights1percent

1% flights
JavaScript
2
star
41

zmq

Drop-in replacement for zmq gem with included binaries
Ruby
2
star
42

json_to_csv_to_json

csv_to_json and json_to_csv
Ruby
2
star
43

cvg

Like jq or grep for csv. Combine one or more CSVs while filtering on fields with regular expressions, whitelists, presence, missing, etc.
Ruby
2
star
44

has_timestamps

Rails plugin to add named timestamps to ActiveRecord models.
Ruby
2
star
45

zip5

Convert United States zip codes to their correct Zip5 representation, even if they're missing a leading zero and/or they have the +4 suffix.
Ruby
2
star
46

string_enumerator

Given a string containing placeholders (like [color]), enumerate all of the possible strings resulting from filling those placeholders with replacements (like red, blue).
Ruby
2
star
47

nonrandomapp

Ruby
1
star
48

mini_record

mini_record-compat is DEPRECATED. Use original mini_record OR active_record_inline_schema instead.
Ruby
1
star
49

string_replacer

DEPRECATED/POINTLESS - use sed or augeas. Replace text in a file without disturbing the rest of the file.
Ruby
1
star
50

geocode_records

As long as you do very specific things... quickly re-geocode tables.
Ruby
1
star
51

force_schema

[DEPRECATED - use mini_record] Declare a table structure like an ActiveRecord migration and run 'force_schema!' whenever you want. For when you don't need up and down migrations.
Ruby
1
star
52

fast_timestamp

Rapidly and arbitrarily timestamp ActiveRecord records.
Ruby
1
star