Giter VIP home page Giter VIP logo

Comments (9)

raulpuric avatar raulpuric commented on August 21, 2024

Currently running master branch against a subset of the amazon review dataset we use to test. Using the --lazy and --loose_json json flags it currently is working for me. Would you mind running cat /mnt/sdal/Datasets/NLP/Amazon/aggressive_dedup.json | head -c 10000 for me and dumping the output here.

from sentiment-discovery.

mkachuee avatar mkachuee commented on August 21, 2024

I fixed the problem by not using the --lazy option!

Here is a part of the output you requested:

{"reviewerID": "A000023026XVLM97BM7KY", "asin": "0957544901", "reviewerName": "T.Dobrowolski", "helpful": [0, 0], "reviewText": "This book is really great for those that have an artistic soul and looking for better life or business. I highly recommend it.", "overall": 4.0, "summary": "Use creativity to design your life", "unixReviewTime": 1397865600, "reviewTime": "04 19, 2014"}
{"reviewerID": "A00003262KNLZOSMMMFVV", "asin": "B002Y2U8MC", "reviewerName": "Harue Rojas", "helpful": [1, 1], "reviewText": "It is not a sticker, it is a Chritsmas story by itself, full of details, and cover a big space", "overall": 5.0, "summary": "Very complete", "unixReviewTime": 1361145600, "reviewTime": "02 18, 2013"}
{"reviewerID": "A00003262KNLZOSMMMFVV", "asin": "B004P598FY", "reviewerName": "Harue Rojas", "helpful": [1, 1], "reviewText": "LOve the size and the details, and its very colorful. It looks really nice and catch your attention when you pass by", "overall": 5.0, "summary": "Great details", "unixReviewTime": 1361145600, "reviewTime": "02 18, 2013"}
{"reviewerID": "A00003262KNLZOSMMMFVV", "asin": "B005MU3UE6", "reviewerName": "Harue Rojas", "helpful": [1, 1], "reviewText": "Its very colorful and the image once you put all de pieces in order is pretty cute. The window looks terrific", "overall": 5.0, "summary": "Its a very beautiful image!", "unixReviewTime": 1361145600, "reviewTime": "02 18, 2013"}
{"reviewerID": "A00003322NZ9C82Y46DFN", "asin": "0786903945", "reviewerName": "Kyle Downey", "helpful": [0, 0], "reviewText": "The condition of the book was exactly as described. There were minimal if any damages, and certainly nothing that would hinder the use of the book.", "overall": 5.0, "summary": "Great Condition", "unixReviewTime": 1403740800, "reviewTime": "06 26, 2014"}

from sentiment-discovery.

raulpuric avatar raulpuric commented on August 21, 2024

It's really strange that --lazy would be the culprit. Using --lazy will also be necessary if you have any intentions to run with multiple gpus/scale up to datasets that do not fit into memory.

Just to confirm, you're on the master branch right?

from sentiment-discovery.

mkachuee avatar mkachuee commented on August 21, 2024

from sentiment-discovery.

raulpuric avatar raulpuric commented on August 21, 2024

Would you mind trying this lazy_loader modification and running with --lazy. I think this works for me.

def make_lazy(path, strs, data_type='data'):
    """make lazy version of file"""
    lazypath = get_lazy_path(path)
    if not os.path.exists(lazypath):
        os.makedirs(lazypath)
    datapath = os.path.join(lazypath, data_type)
    lenpath = os.path.join(lazypath, data_type+'.len.pkl')
    if not torch.distributed._initialized or torch.distributed.get_rank() == 0:
        with open(datapath, 'w') as f:
            str_ends = []
            str_cnt = 0
            for s in strs:
                f.write(s)
                str_cnt += len(s)
                str_ends.append(str_cnt)
        pkl.dump(str_ends, open(lenpath, 'wb'))
    else:
        while not os.path.exists(lenpath):
            time.sleep(1)

from sentiment-discovery.

mkachuee avatar mkachuee commented on August 21, 2024

Tested it. It creates the same error.

from sentiment-discovery.

raulpuric avatar raulpuric commented on August 21, 2024

Do you have a docker container so I can repro.

from sentiment-discovery.

mkachuee avatar mkachuee commented on August 21, 2024

Sorry, I don't.

from sentiment-discovery.

raulpuric avatar raulpuric commented on August 21, 2024

your problem should have been fixed in #31 and should be merged up in master shortly.

from sentiment-discovery.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.