- I have worked on GroupIntoBatches for dataset games.csv
- My Google Colab Notebook on GroupIntoBatches
- Demonstration Video link
- My personal repo link
- Python
- Apache beam
- Google Colab
- First, install apache-beam using the below command.
!pip install --quiet -U apache-beam
- Install the other dependencies
!pip install apache-beam[gcp,aws,test,docs]
-
The command that lists all the files
! ls
- First upload your .csv file to your google drive account.
- The email used should be same for both google drive and google Colab accounts.
- Import the .csv file run into google colab.
# Code to read csv file into colaboratory:
!pip install -U -q PyDrive
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
# Autheticate E-Mail ID
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)
- To get the id of the file, right-click on the file in google drive account, select share link option, then copy the link.
- Remove the part that contains https://
- keep only the id part.
# Get File from Drive using file-ID
# Get the file
downloaded = drive.CreateFile({'id':'1b73yN7MjGytqSP5wimYAQmtByOvGGe8Y'}) # replace the id with id of file you want to access
downloaded.GetContentFile('superbowl-ads.csv')
- Command to add the result to a output file
!cat output.txt-00000-of-00001 # output file