Why This Chapter Matters
ChatGPT is extraordinarily capable, but it is also capable of being wrong in ways that look completely right. It writes confidently. It cites sources that do not exist. It produces legal or medical advice that sounds authoritative but may be dangerous. It can reflect the biases baked into its training data. And if you paste the wrong information into it, you may be putting sensitive data at risk.
None of this means ChatGPT is not useful — this entire tutorial series proves that it is. But using it without understanding its limitations is like driving a car without understanding that the brakes can fail on a wet road. This chapter is the safety briefing.
What Are Hallucinations?
In AI terminology, a hallucination is when a language model generates text that is factually incorrect but presented with the same fluent confidence as correct text. The model does not know it is wrong — it has no internal fact-checking mechanism.
Why Do Hallucinations Happen?
To understand this, remember what ChatGPT actually is: a next-token prediction engine trained on an enormous amount of text. It learned patterns of language — which words follow which, how arguments are structured, what a citation looks like — but it did not learn a ground truth database of facts.
When you ask "What is the population of Surat?", the model does not look up an authoritative source. It generates the most statistically plausible answer based on patterns in its training data. If the training data contained inconsistent or outdated population figures, the model averages across them — and can produce a number that is wrong.
The problem is worsened by the fact that the model is trained to sound helpful and confident. Saying "I'm not sure, please verify" is statistically less common in training data than confidently stating an answer. So the model defaults to confidence even when it should hedge.
Types of Hallucinations
| Type | Example |
|---|---|
| Fabricated citations | "According to a 2023 study by IIM Ahmedabad published in the Journal of Finance..." — the study does not exist |
| Wrong statistics | Citing an incorrect GDP figure or population number |
| False historical facts | Getting a date, name, or sequence of events wrong |
| Non-existent people | Inventing a professor or politician and attributing quotes to them |
| Made-up laws or regulations | Citing a section of the Companies Act 2013 that does not exist |
| Plausible-but-wrong code | Code that looks correct and even runs, but produces the wrong output for edge cases |
A Real-World Example
In 2023, two American lawyers submitted a legal brief that cited case law generated by ChatGPT. The cases did not exist. The court sanctioned both lawyers. This is the highest-stakes version of a hallucination — but smaller versions happen every day to people who do not verify.
The Knowledge Cutoff
ChatGPT's training data has a cutoff date. The model knows nothing about events after that date unless you tell it or it uses the web search tool.
As of GPT-4o in 2026: The knowledge cutoff is approximately early 2024. Events, prices, laws, regulations, and company structures that changed after that date are not known to the model.
What this means in practice:
Question: "What is the current interest rate on SBI home loans?"
ChatGPT answer: May quote a rate that was accurate in 2023 but has since changed.
Question: "Who is the current CEO of Infosys?"
ChatGPT answer: May be outdated if there has been a leadership change.
Question: "Is GST applicable on software subscriptions?"
ChatGPT answer: May not reflect the most recent CBIC clarifications.
Rule of thumb: For anything that changes over time — prices, interest rates, tax rules, government policies, sports results, company information — always verify with a current authoritative source.
Confidently Wrong Outputs
The most dangerous category of error is not the obvious mistake — it is the confident, detailed, plausible-sounding wrong answer.
If you ask ChatGPT "What is 2 + 2?" and it says "5", you will catch that immediately. But if you ask "What are the TDS deduction rates for Section 194C for FY 2025-26?" and it gives you a detailed, formatted table with slightly wrong rates — you may not catch it without specifically cross-referencing the Income Tax Act or the Income Tax department's website.
High-Risk Domains for Confident Wrong Outputs
- Legal: Contract clauses, court procedures, rights under Indian law, SEBI regulations
- Medical: Drug interactions, dosages, diagnostic criteria
- Financial: Tax slabs, deduction limits, mutual fund SEBI rules, RBI guidelines
- Technical: Specific API versions, library functions that have been deprecated, security configurations
- Academic: Research findings, author attributions, publication dates
In these domains, always treat ChatGPT as a starting point for research, never as the final word.
Bias in Training Data
ChatGPT was trained on text from the internet, books, and other sources. That corpus reflects the biases of the people who wrote it — cultural, political, gender-based, socioeconomic, and geographic.
What This Looks Like in Practice
Geographic bias: Training data is skewed heavily toward English-language Western content. When ChatGPT answers questions about Indian culture, history, or business, it sometimes applies Western frameworks or gets details wrong that an Indian subject matter expert would not.
Gender bias: Historically, occupations like "engineer" or "doctor" produced more male-associated pronouns in early AI systems. While OpenAI has invested in mitigating this, subtle associations still exist.
Recency bias in prominence: Well-documented, widely-discussed events are better represented than local or under-reported events. A protest in a tier-2 Indian city that was extensively covered in English-language national media may be known; one covered only in regional language press may not be.
Confirmation of majority views: The model tends to reflect the statistical majority of opinions in its training data, which can make minority viewpoints — even when important or correct — less prominent in its outputs.
What This Means for You
- Do not take ChatGPT's framing of cultural or political topics as neutral
- For India-specific regulatory, historical, or cultural questions, cross-reference Indian authoritative sources
- When the stakes are high, get a human expert involved
Privacy Risks: What NOT to Paste into ChatGPT
Every message you send to ChatGPT over the web interface is transmitted to OpenAI's servers. By default, OpenAI may use these conversations to improve future models (you can opt out in Settings).
Never paste into ChatGPT:
- Full names + Aadhaar numbers + PAN numbers of any person
- Customer personal data (names, phone numbers, addresses, purchase history)
- Employee salary details or HR records with identifying information
- Proprietary business data under NDA (client names, contract values, internal strategies)
- Source code from a codebase your employer owns, if confidentiality is required
- Unpublished financial results, earnings data before announcement
- Medical records (yours or a patient's)
- Login credentials, API keys, passwords
The Samsung Leak Case
In 2023, Samsung engineers accidentally leaked confidential chip design information by pasting proprietary source code into ChatGPT to debug it. Samsung subsequently banned ChatGPT use internally. This is a well-documented case of what can happen when employees use AI tools without a clear privacy policy.
Safer Alternatives
- Anonymise before pasting: Replace "Ravi Sharma, Aadhaar 1234 5678 9012" with "Customer A, ID XXXX"
- Use local models: Tools like Ollama let you run an AI model on your own laptop, so data never leaves your device
- Enterprise subscriptions: ChatGPT Enterprise and Microsoft Copilot for Microsoft 365 both offer data agreements where your inputs are not used to train models
Plagiarism and Copyright Concerns
When You Use ChatGPT's Output as Your Own
If you submit ChatGPT-written text as your own original work — in a client deliverable, a published article, or academic assignment — without disclosure, this raises plagiarism and dishonesty concerns. The output is not plagiarised from a specific human author (usually), but it is also not your own original intellectual work.
Many institutions and employers are developing clear policies on AI disclosure. Follow those policies explicitly.
Copyright of ChatGPT's Output
In most jurisdictions including India, copyright requires a human author. Text generated entirely by an AI model currently does not qualify for copyright protection. This means:
- You cannot copyright purely AI-generated content
- However, if you substantially edit and add to AI output, the original additions may be yours
- This area of law is evolving rapidly — keep up with developments from the Copyright Office of India
What ChatGPT Might Reproduce
While ChatGPT is designed not to reproduce copyrighted text verbatim, it can sometimes output passages that are close to training data. For published creative or journalistic work, run outputs through a plagiarism checker before submitting.
Academic Integrity
If you are a student, this section is critical.
Most Indian universities and coaching institutes are either prohibiting AI-generated submissions or requiring disclosure. The exact rules vary by institution, but the ethical principle is universal: submitting AI-generated work as your own without disclosure is academic dishonesty, equivalent to plagiarism.
What is generally acceptable (varies by institution):
- Using ChatGPT to understand a concept you are struggling with
- Asking it to explain a topic in simpler terms
- Using it to check grammar on an essay you wrote yourself
- Using it to brainstorm ideas, then writing the response yourself
What is generally not acceptable:
- Pasting the assignment question → copying the output → submitting as your answer
- Having ChatGPT write your thesis chapter
- Generating lab reports or case studies with AI without disclosure
- Using it in closed-book exams (even if technically possible on a phone)
AI Detection Tools
Institutions increasingly use AI detection tools such as Turnitin's AI detection feature. These are imperfect — they produce false positives and can be fooled by paraphrasing — but their use is growing. The more important reason to be honest is integrity, not just detection risk.
How to Verify ChatGPT's Answers
The verification principle: Use ChatGPT to find the answer; use authoritative sources to confirm it.
For Indian Legal and Regulatory Questions
| Topic | Authoritative Source |
|---|---|
| Income Tax | incometax.gov.in |
| GST | gst.gov.in |
| SEBI regulations | sebi.gov.in |
| RBI policy, repo rate | rbi.org.in |
| Company law (MCA) | mca.gov.in |
| Labour law | labour.gov.in |
| Consumer protection | consumeraffairs.nic.in |
For Financial Data
- BSE/NSE for stock prices:
bseindia.com,nseindia.com - Mutual fund NAVs:
mfamc.in(AMFI official) - Gold prices:
mcxindia.comorgoodreturns.in - Loan rates: the bank's own website directly
For Medical and Health Information
Always consult a qualified doctor. Online sources like www.nhp.gov.in (National Health Portal of India) are more reliable than AI summaries for health topics.
The Citation Test
When ChatGPT cites a study, paper, or news article, search for it yourself. If you cannot find the exact citation on Google Scholar or the publisher's website within 2 minutes, assume it may be fabricated.
Verification prompt:
"Verify the claim you just made. What specific sources would I check
to confirm this? Provide the exact URL or publication name."
This forces the model to either give you checkable sources or acknowledge uncertainty.
Common Pitfalls
1. Treating ChatGPT as a search engine Search engines index real web pages. ChatGPT generates plausible text. These are fundamentally different — one retrieves, the other predicts. Treat every factual claim as unverified until you check it.
2. Using it for medical or legal decisions "Can I take ibuprofen with my current prescription?" or "Is my employer's notice period clause legal?" require a qualified professional. ChatGPT can explain concepts, but the stakes of a wrong answer are too high to rely on an AI.
3. Not opting out of data training By default, OpenAI may use your conversations for training. Go to Settings → Data Controls → toggle off "Improve the model for everyone" if you handle any sensitive topics.
4. Assuming that a long, detailed answer is a correct answer Length and detail increase perceived credibility. A 500-word confident hallucination is more dangerous than a 10-word wrong answer — it is harder to notice.
5. Copy-pasting outputs without reading The most basic version of the hallucination problem. Even if you intend to verify later, the act of reading before submitting catches most obvious errors.
6. Ignoring the knowledge cutoff for time-sensitive queries Setting a reminder: whenever your question has the word "current", "latest", "now", "today", or a specific recent year, verify with a live source rather than trusting the training data.
Practice Exercises
-
Ask ChatGPT to cite three academic studies supporting a claim of your choice (any topic). Then search for each citation on Google Scholar. Report what you find — do all three exist?
-
Ask ChatGPT "What is the current TDS rate under Section 194C for individual contractors?" Then verify the answer on
incometax.gov.in. Note any discrepancy. -
Look at your ChatGPT data settings (Settings → Data Controls). Is "Improve the model for everyone" currently on or off? Change it to your preferred setting and note what changed.
-
Find an example of a public case or news story where someone was misled by AI-generated information (a quick web search for "AI hallucination news" will surface several). Write a 150-word summary of what happened and what could have prevented it.
-
Write a short paragraph on a topic you know well (your field of study or work). Then ask ChatGPT to write a paragraph on the same topic. Compare the two for accuracy. Identify at least one thing ChatGPT got subtly wrong or oversimplified.
Summary
- Hallucinations occur because ChatGPT predicts plausible text, not verified facts — it has no internal fact-checking mechanism and will state incorrect information with full confidence
- The knowledge cutoff means anything that changed after early 2024 is unknown to the model — always verify current prices, rates, laws, and leadership information
- Confidently wrong outputs are the most dangerous category — detailed, structured, fluent hallucinations are harder to spot than simple errors
- Training data biases are real: geographic, gender, and cultural biases affect ChatGPT's outputs, particularly for under-represented topics and non-Western contexts
- Never paste personal identifying information, proprietary data, confidential code, or financial data that is not yours to share into the public ChatGPT interface
- Plagiarism and copyright: AI-generated text is generally not copyrightable, and submitting it as your own without disclosure violates academic integrity norms
- Verification is non-negotiable for high-stakes domains — use Indian government websites, official regulators, and peer-reviewed sources to confirm any factual claim ChatGPT makes