We need a script/executable/process which automates browsing to WhatsApp Web, go to selected chats & groups, and sync down & download the full ordered sequential data-consistent message history out of the live/changing page to a saved format (database / HTML structured data files) and media files downloaded to a related file storage directory.
We need the source code of this working project clean readable and well designed code-base, properly indented & spaced, good coding practices and up to date libraries & dependencies etc.
Target Browser: Google Chrome. Optional: Also other/cross browser.
Possible technologies:
- Selenium WebDriver with Java, Python, Node.js JavaScript etc.
- Chrome Extension using HTML/CSS/JavaScript (lacking database/file-system facilities)
- Tamper monkey User script using JavaScript (lacking database/file-system facilities)
- Other
- Any appropriate technology stack you might recommend
Data Wanted: Any & all available data that can be obtained out of the webpage.
Some data points examples: Group/chat name, message date & time, all author information; phone number and/or name when available etc., full message content with emojis & Unicode content & links etc., all media (audio, image, video, documents etc.), in reply to message references, ordinal/position in reference to previous and next message, author/group/status & quote content from referenced message, is message deleted, is message forwarded (and forwarded details), and ANY other obtainable data from the webpage.
Some requirements descriptions and known complications/challenges:
- There needs to be communication between the already downloaded persisted database & file store, and the script processing the webpage, in order to identify messages and find what has already been downloaded what is already synced down & up to date, and what still needs to be fetched from the webpage.
- Data consistency; distinctly identifying messages correctly (perhaps by hashing a digest based on message group
- date & time
- message content
- ordinal position relative to previous/next messages
- other identifying factors etc.)
- Progressively & continuously syncing history from the web WhatsApp interface down to database/file store, keeping track of where the process is holding, which messages are already synced, which need updating/re-downloading etc.
- Indexing & searching/seeking into message history
- Will probably need a lot of Deferred & Asynchronous processing; waiting for media to download into browser, detect timeouts/failures, retry, saving state about what needs to be retried again in future processing
- Possibly using the group info/chat info Media panel to access & load media files
- Apparently the only way to get full message text with emojis etc. in whole is by selecting the full text content and copying to clipboard.
- Dealing with long messages collapsed 'Read more' content
- Scrolling up/down the message history roll and waiting for messages to load in infinite-scroll loading batches.
- Downloading media, waiting for media to load, retry failures, store in organized folder structure with relations saved to database/HTML data files.
- How to download video clips
- Other complications that might be discovered in the effort to sync down WhatsApp per-chat history
About the recuiterMember since Mar 14, 2020 Yusuf Jepara
from Trenciansky, Slovakia