This notebook provides a quick overview for getting started with the UnDatasIO document loader. UnDatasIO enables efficient loading and parsing of various document formats including PDF, PNG, JPG, JPEG, and JFIF, with features like document lazy loading and native async support, all through UnDatasIO’s secure cloud API. These capabilities make the processed data ready for generative AI workflows like RAG.For detailed documentation on all features and configurations, refer to the official API reference.
Document( metadata={'source': 'demo.pdf', 'task_id': 't1', 'file_id': 'f1'}, page_content='Growing a Tail: Increasing Output Diversity in Large Language Models\n\nAuthors: Michal Shur-Ofry1, Bar Horowitz-Amsalem1†, Adir Rahamim2, Yonatan Belinkov2*\n\nAffiliations:\n\n1Law Faculty, Hebrew University of Jerusalem; Jerusalem, Israel.\n\n2Faculty of Computer Science, Technion – I')
Copy
Ask AI
print(docs[0].page_content[:300])
Copy
Ask AI
Growing a Tail: Increasing Output Diversity in Large Language ModelsAuthors: Michal Shur-Ofry1, Bar Horowitz-Amsalem1†, Adir Rahamim2, Yonatan Belinkov2*Affiliations:1Law Faculty, Hebrew University of Jerusalem; Jerusalem, Israel.2Faculty of Computer Science, Technion – I
UnDatasIOLoader supports lazy loading for memory-efficient iteration.
Copy
Ask AI
pages = []for doc in loader.lazy_load(): pages.append(doc)pages[0]
Copy
Ask AI
Document( metadata={'source': 'demo.pdf', 'task_id': 't1', 'file_id': 'f1'}, page_content='Growing a Tail: Increasing Output Diversity in Large Language Models\n\nAuthors: Michal Shur-Ofry1, Bar Horowitz-Amsalem1†, Adir Rahamim2, Yonatan Belinkov2*\n\nAffiliations:\n\n1Law Faculty, Hebrew University of Jerusalem; Jerusalem, Israel.\n\n2Faculty of Computer Science, Technion – I')