Apify

All

130 repositories

crawlee
Public
Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.
nodejs javascript npm crawler scraper automation typescript web-crawler headless scraping
TypeScript
•
Apache License 2.0
•669•16k•117•14•Updated Nov 19, 2024Nov 19, 2024
crawlee-python
Public
Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.
python crawler scraper automation web-crawler headless scraping crawling pip web-scraping
Python
•
Apache License 2.0
•319•4.6k•82•8•Updated Nov 19, 2024Nov 19, 2024
docusaurus-plugin-typedoc-api
Public
Apify's fork of `docusaurus-plugin-typedoc-api`, customized for our Python documentation.
TypeScript
•26•0•0•0•Updated Nov 19, 2024Nov 19, 2024
apify-sdk-python
Public
The Apify SDK for Python is the official library for creating Apify Actors in Python. It provides useful features like actor lifecycle management, local storage emulation, and actor event handling.
automation scraping apify python sdk
Python
•
Apache License 2.0
•11•120•14•1•Updated Nov 19, 2024Nov 19, 2024
apify-cli
Public
Apify command-line interface helps you create, develop, build and run Apify actors, and manage the Apify cloud platform.
command-line headless-chrome puppeteer serveless apify
TypeScript
•19•122•35•3•Updated Nov 19, 2024Nov 19, 2024
openapi
Public
An OpenAPI specification for the Apify API.
JavaScript
•
MIT License
•0•2•17•3•Updated Nov 19, 2024Nov 19, 2024
workflows
Public
Apify's reusable github workflows
Python
•4•7•4•4•Updated Nov 19, 2024Nov 19, 2024
actor-whitepaper
Public
This whitepaper describes a new concept for building serverless microapps called Actors, which are easy to develop, share, integrate, and build upon. Actors are a reincarnation of the UNIX philosophy for programs running in the cloud.
Apache License 2.0
•0•2•7•4•Updated Nov 19, 2024Nov 19, 2024
apify-shared-js
Public
Utilities and constants shared across Apify projects.
TypeScript
•
Apache License 2.0
•11•12•5•0•Updated Nov 19, 2024Nov 19, 2024
apify-eslint-config
Public
Apify ESLint preset to be shared between projects
JavaScript
•
Apache License 2.0
•0•2•1•0•Updated Nov 18, 2024Nov 18, 2024
apify-client-js
Public
Apify API client for JavaScript / Node.js.
JavaScript
•
Apache License 2.0
•27•68•16•4•Updated Nov 18, 2024Nov 18, 2024
rag-web-browser
Public
RAG Web Browser is an Apify Actor to feed your LLM applications and RAG pipelines with up-to-date text content scraped from the web.
scraper ai crawling serp rag llm
TypeScript
•
Apache License 2.0
•0•3•1•1•Updated Nov 18, 2024Nov 18, 2024
apify-sdk-js
Public
Apify SDK monorepo
actor apify nodejs javascript typescript sdk
TypeScript
•
Apache License 2.0
•35•123•10•7•Updated Nov 18, 2024Nov 18, 2024
apify-client-python
Public
Apify API client for Python
api client scraping apify python
Python
•
Apache License 2.0
•11•49•8•3•Updated Nov 18, 2024Nov 18, 2024
apify-docs
Public
This project is the home of Apify's documentation.
API Blueprint
•
Apache License 2.0
•76•29•70•22•Updated Nov 18, 2024Nov 18, 2024
fingerprint-suite
Public
Browser fingerprinting tools for anonymizing your scrapers. Developed by Apify.
scraping fingerprinting playwright typescript puppeteer
TypeScript
•
Apache License 2.0
•103•987•19•11•Updated Nov 18, 2024Nov 18, 2024
proxy-chain
Public
Node.js implementation of a proxy server (think Squid) with support for SSL, authentication and upstream proxy chaining.
javascript-library headless-chrome proxy-server proxychains
JavaScript
•
Apache License 2.0
•145•850•7•11•Updated Nov 17, 2024Nov 17, 2024
apify-zapier-integration
Public
Apify integration for Zapier
api zapier web-scraping apify
JavaScript
•
Apache License 2.0
•1•8•5•2•Updated Nov 15, 2024Nov 15, 2024
make-integrations-scraper
Public
Scrape list of available integrations from Make
TypeScript
•0•0•0•1•Updated Nov 15, 2024Nov 15, 2024
zapier-integrations-scraper
Public
Scrape list of Zapier integrations from Zapier website
TypeScript
•0•0•0•1•Updated Nov 15, 2024Nov 15, 2024
actor-vector-database-integrations
Public
Transfer data from Apify Actors to vector databases (Chroma, Milvus, Pinecone, PostgreSQL (PG-Vector), Qdrant, and Weaviate)
Python
•
Apache License 2.0
•4•4•0•0•Updated Nov 14, 2024Nov 14, 2024
actor-beautifulsoup-scraper
Public
Python
•
Apache License 2.0
•0•3•0•0•Updated Nov 13, 2024Nov 13, 2024
pull-request-toolkit-action
Public
The Github action that makes sure that each PR is correctly set up and has a milestone set.
TypeScript
•
Apache License 2.0
•1•1•1•0•Updated Nov 13, 2024Nov 13, 2024
apify-actor-docker
Public
Base Docker images for Apify actors.
Dockerfile
•
Apache License 2.0
•22•70•9•3•Updated Nov 8, 2024Nov 8, 2024
actor-aws-costs-to-slack
Public
This tool integrates with AWS to monitor service usage costs and posts a summary of these costs to a Slack channel. The summary includes costs for various AWS services along with a chart that provides a visual breakdown of the costs over time.
TypeScript
•
MIT License
•0•0•0•1•Updated Nov 5, 2024Nov 5, 2024
actor-templates
Public
This project is the 🏠 home of Apify actor template projects to help users quickly get started.
Python
•18•26•8•1•Updated Oct 25, 2024Oct 25, 2024
homebrew-tap
Public
A Homebrew tap for Apify tools
Ruby
•1•8•0•4•Updated Oct 25, 2024Oct 25, 2024
got-scraping
Public
HTTP client made for scraping based on got.
TypeScript
•44•557•15•1•Updated Oct 23, 2024Oct 23, 2024
release-pr-action
Public
This action simplify creating of release PR
JavaScript
•
Apache License 2.0
•0•0•0•0•Updated Oct 23, 2024Oct 23, 2024
airbyte
Public
Airbyte is an open-source EL(T) platform that helps you replicate your data in your warehouses, lakes and databases.
Python
•
Other
•4.1k•0•0•0•Updated Oct 3, 2024Oct 3, 2024