Working with DataFrames.jl beyond CSV files

07/25/2023, 5:30 PM — 8:30 PM UTC
32-144

Abstract:

Data scientists need to work with various data sources and sinks in their projects. During the workshop I you will learn how you can work with standard data formats using DataFrames.jl. A special focus will be put on working with data that is larger than available RAM.

Description:

Data science pipelines created in Julia typically need to be integrated into larger workflows involving various tools and technologies. Therefore an important aspect is ensuring interoperability, especially for the case of large data that does not fit in RAM of a single machine. During the workshop I plan to discuss working with the following data formats (along with their pros and cons):

  • CSV;
  • Apache Arrow;
  • Apache Parquet;
  • JSON;
  • statistical packages; using examples of RData, Stata files;
  • databases; using examples of SQLite and DuckDB.

All the examples will use DataFrames.jl that provides a representative implementation of Tables.jl table.

Platinum sponsors

JuliaHub

Gold sponsors

ASML

Silver sponsors

Pumas AIQuEra Computing Inc.Relational AIJeffrey Sarnoff

Bronze sponsors

Jolin.ioBeacon Biosignals

Academic partners

NAWA

Local partners

Postmates

Fiscal Sponsor

NumFOCUS