Search

Quality and consistency through collaboration

All.FirmWide services.Cyber and Privacy

Developers of artificial intelligence (AI) systems often rely heavily on large data sets to train their models.  Often this data may be scraped from social media platforms and other publicly available websites. While AI developers contend such practices are essential to train their model,[1] the legality of obtaining personal information in this manner is being increasingly scrutinised by the Office of the Australian Information Commissioner (OAIC) and international regulators.[2]

This article looks at the latest regulatory actions and guidance issued by the OAIC and the key privacy issues arising from data scraping.

What is data scraping?

The term ‘data scraping’ describes the importing or extracting of information from a website or publicly available database into a spreadsheet or other application, often contrary to the terms and conditions of the website and database where the information originated or is stored.

Data scraping can be classed into two categories:

  1. Screen scraping – the extraction of various types of visual data including text, images, and links present on a screen.[3]
  2. Web scraping – the extraction of information from an entire webpage (for example, social media websites). Web scraping is not limited to the data visually presented on the screen and includes tabular data, source code, and HTML information.[4] Web scraping is more widely used by AI systems as they have a purely automated processing system which is more suitable for large-scale data extraction.[5]

AI system developers argue that mass data collection is essential for the functionality of AI systems.[6] The data is used to train AI models and to prevent feedback loops that could degrade their outputs. However, the collection of personal information via data scraping raises various privacy issues.

Recent data scraping case – Property Lovers interfered with Australians’ privacy by scraping data

OAIC investigation

The OAIC recently launched an investigation into data scraping practices carried out by Property Lovers Pty Ltd (Property Lovers).[7] Property Lovers trained subscribers to identify and target distressed properties and sellers   by compiling 'leads lists'.[8] The lists were generated by scraping personal data from third party websites including daily court listings, published death and funeral notices and property listings accessed through CoreLogic.

The Privacy Commissioner found that Property Lovers’ data scraping practices breached several of the Australian Privacy Principles (APPs).  In particular:

  • Lawful and fair collection (APP 3.5) – Property Lovers failed to collect personal information in a lawful and fair manner as:
    • personal information was collected in violation of the terms of use of third-party websites
    • the individuals had no reasonable expectation that their personal information would be collected in the way it was by Property Lovers, and
    • many of the affected individuals were in vulnerable circumstances.
  • Notification of collection (APP 5.1) – Property Lovers failed to notify individuals of certain matters relating to the collection of their personal information.  In particular Property Lovers should have:
    • contacted individuals to notify them that their personal information had been collected and the purpose of collection, and
    • included required details relating to the collection of personal information in its privacy policy.
  • Quality of personal information (APP 10.2) – Property Lovers did not take reasonable steps to verify the accuracy of the data they used and disclosed.

Regulatory actions taken

As a result of these breaches, the Privacy Commissioner ordered Property Lovers to:

  • immediately cease the collection of personal information from third-party sources to destroy all lists containing unlawfully scraped information within 30 days, and to provide evidence of compliance to the OAIC
  • update their privacy policies to reflect lawful data handling practices, and
  • publish a formal apology acknowledging their privacy violations.[9]

Key takeaways

The determination serves as a timely reminder for businesses that while the collection of large amounts of data is seen as vital for many businesses initiatives, its collection (particularly where personal information is involved) must comply with legal and ethical obligations.  In addition, while the case involved human-compiled data, the privacy issues raised are of broader application to those who engage in AI-aided data scraping.

OAIC guidance for AI developers

The OAIC has recently released privacy guidance for AI developers.[10] These guidelines emphasise that:

  • Personal information must be collected directly from the individual unless it is unreasonable or impracticable to do so.
  • Personal information must be collected by lawful and fair means. Covert collection of personal information, including through data scraping may amount to unfair collection, however this will depend on factors such as:
    • an individual’s reasonable expectations about collection and use of their personal information
    • the sensitivity of the personal information
    • the intended purpose of the collection
    • the risk of harm to individuals as a result of the collection, and
    • steps taken by the developer to prevent privacy impacts, including processes to delete or de-identify personal information.
  • Sensitive information may not be collected without consent (unless an exception applies) and failure to object to a proposal to handle information in a particular way or the fact that personal information is publicly available should not be taken as consent. 

Consent must be informed, voluntary, and revocable, given by individuals who fully understand how their information will be used.[11] AI-driven data scraping rarely meets these criteria, as individuals are typically unaware that their data has been extracted or how it will be processed. Furthermore, the sheer scale and speed of AI-driven collection make it nearly impossible to ensure proper consent mechanisms or allow individuals to withdraw consent effectively once their data has been aggregated.[12]

Conclusion

While data scraping may be seen by some as necessary to obtain the vast amounts of data needed to train AI models and may be tempting given its publicly available nature, the privacy related issues should be considered carefully before going down this path.  

 

[1]   Roberta Aukstikalnyte, ‘The Essential Role of Web Scraping in AI Model Training’ (Web Page, 2025)

[2]  Taner Kuru, ‘Lawfulness of the mass processing of publicly accessible online data to train large language models’ (2024) International Data Privacy Law; Office of the Australian Information Commissioner, ‘Global Expectations of Social Media Platforms and Other Sites to Safeguard Against Unlawful Data Scraping’ (Web Page, 2025) 

[3]   Fortra, ‘What is Screen Scraping and How Does it Work?’ (Web Page, 2025) 

[4]   Gino Fontana ‘Web scrapping: Jurisprudent and legal doctrines’  (2024) The Journal of intellectual property <Web scraping: Jurisprudence and legal doctrines - Fontana - The Journal of World Intellectual Property - Wiley Online Library 

[5]   Fortra, ‘What is Screen Scraping and How Does it Work?’ (Web Page, 2025) 

[6]  Ilia Shumailov et al, ‘AI Models Collapse When Trained on Recursively Generated Data’ (2024) 615 Nature 759.

[7]   Office of the Australian Information Commissioner, ‘Grubisa Companies Interfered with Australians’ Privacy by Scraping Data’ (Web Page, 2025)

[8]   Distressed properties: where owners facing divorce, bankruptcy, or deceased estates might sell below market value.

[9]   Office of the Australian Information Commissioner, ‘Grubisa Companies Interfered with Australians’ Privacy by Scraping Data’ (Web Page, 2025)

[10]  Office of the Australian Information Commissioner, ‘Guidance on Privacy and Developing and Training Generative AI Models’ (Web Page, 2024) 

[11]  As such, it is insufficient to just advise an individual of an organisation’s collection, use or disclosure of their personal information. You must comply with the below to ensure you meet the requirements for obtaining valid consent; Office of the Australian Information Commissioner, ‘Consent to the Handling of Personal Information’ (Web Page, 2025) 

[12]  Mark Rasch, ‘Data Entanglement, AI and Privacy: Why the Law isn’t ready’ (Web Page, 2025) Data Entanglement, AI and Privacy: Why the Law Isn’t Ready - Security Boulevard

 

Return To Top